In this contest, while I was browsing through FST submissions, I encountered many TLEs on test 8 in B. Here are some examples: 142848311, 142836654. The solutions appeared to be completely correct. What happened here?
The problem was that these contestants used memset
before every test. If you didn't know, memset(a, 0, sizeof(a))
is linear in the size of the array a
.
This is why these contestant TLEd. Every testcase, they did $$$1000000$$$ unnecessary operations.
The lesson here? Stop using memset
to reset stuff when there are many testcases. Resetting by hand works just fine. Or know how memset
actually works, so that you don't TLE.
Just avoid global variables like the plague in problems with multiple test cases.
Just avoid global variables like the plague
in problems with multiple test cases.Just avoid
global variables like the plague inproblems with multiple test cases.Problems with multiple test cases are actually very nice, they make pretests stronger and queues shorter. Also, if you write a local brute tester, you can run your code against a brute force faster.
I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...
Btw, did you know that you can write
vector dp(n, vector(m, 0))
? Still $$$O(d)$$$ words "vector", but much better than $$$O(d^2)$$$. However, it is still slower than plain arrays because of multiple pointer jumps.I am not going to
write vectors of vectors of vectors of vectors tosolve 4-dimensional DPs...Use something like this tensor class by ecnerwala. Now you can completely avoid global variables!
Covid deniers would definitely not follow your advice
Thank you for the advice.Can you please let me know what you mean by resetting by hand?I mean like if i reset the values using a for loop then i am also going to take linear time.Is there a better way?
Do something like this:
This is linear in
n
and not in the array size.You can also memset for definite size Like memset(a,0,n * sizeof(int));
Or
for generic types of a[].
And allocate the array of size MAXN + 500.
Also, if you use functions like
void done()
to clear your static arrays, don't forget to call it every time after printing the answer. I made 3 submissions with this mistake in problem E. Don't do that.Another simple mistake I often see is not having
#define int long long
in every submissionThis sometimes give tle.
Another simple mistake I often see is
nothaving#define int long long
in every submissionOn a more serious note, memset is not the problem here, you can just use
memset(a, 0, n * sizeof(int))
. It's just having ANY global variables. >99% of problems with multiple test cases don't require having data that is shared between testcases and is supposed to be in some part cleaned. I always have a struct that encompasses everything I need for a particular test case and that is a hard barrier preventing me from getting any kind of WAs resulting from not cleaning the data between test cases. The downside is that I need to resize every container with appropriate size at the beginning of every testcase, but that is not bigger effort than cleaning it. One can of course forget about resizing it appropriately what is a mistake comparable to forgetting about cleaning it, but if you don't resize then you get runtime error locally on sample test, which is fixed in 10 seconds (cause with proper local flags you are explicitly told that the runtime was caused by this particular container being empty), while forgetting about cleaning it leads to WAs on pretests/systests (and it may not be obvious not cleaning the data is its reason)Embrace lambdas so you never have to use a global variable ever again!
Example of a submission to a multitest graph problem with no global scope used.
You can also get rid of that annoying
auto &self
parameter, by just typing the type explicitly rather than usingauto
:instead of:
So what you're using here is actually something different,
std::function
, and there's surprisingly a performance difference between the two.std::function
tends to have significantly more overhead. Try running this simple example in custom invocation for example:These results were generated using C++20 compiler, though I got similar results using other compilers.
As an extra note, I've benchmarked all the different styles of recursion before in the past (normal recursion,
std::function
, passing self, using Y combinators). Normal recursion is of course still the fastest, but passing self has surprisingly competitive performance in exchange for minimal additional code.A bit off topic,
But why both the ways gets a runtime error for
n=3e6
in custom invocation when I used C++ 17 (64 bit).Is there some optimization in C++ 20 for recursion ??
UPD1 : Actually the self type recursion gives runtime error only for C++17 (64 bit) and works fine on other compilers.
UPD2 : I think the reason may be that C++ 17 (64 bit) default recursion depth is set very low which doesn't allow this much depth in recursion. Then my question is how to increase recursion depth limit.
UPD3 : I guess I will have to stop using C++17 (64 bit) because there is no way we can increase stack space for OJ, Sorry for pinging you.
Yeah sorry I don't know either. All I know is that passing self has less overhead, but I'm not very knowledgeable about the intricacies between different compilers.
About increasing recursion depth, all C++ compilers on Codeforces, such as C++20 and C++17 compile with 256 mb of stack size, so there shouldn't be a significant difference between max recursion depth in the two.
Yeah that's what I didn't understand
That even if the stack size for all compilers are same why C++ 17 (64 bit) give Runtime Error.
The reason behind less overhead with lambdas is the following:
std::function
is a type-erased implementation of function objects (and hence it is useless for competitive programming unless you want to do something like make vectors of "lambdas", which is impossible, so you need to use something like std::function instead), and it requires heap allocations and extraneous pointer indirections, which makes it slower.Lambdas are implemented as anonymous structs with
operator()
, i.e., they are callable (they're called functors). Usingauto
in the parameter list makes them a generic lambda, which is analogous to structs with templatedoperator()
. When you call stuff likedfs(dfs, root, color)
, type deduction can be done at compile-time, and due to the lack of indirections, the compiler is able to optimize it much better.Ok, didn't know about this, thanks! I have two questions:
What is behind auto then, if it's not
std::function
?Does this higher overhead of
std::function
really matter in practice? (i.e. can it actually cause my solution to get TLE in a contest problem?)auto
.function
and lambda.I guess it might matter in problems like this where they ask you to run DFS up to $$$10^6$$$ recursion depth. Though it shouldn't be a huge difference as long as problem authors are reasonable with time limits.
It worked 143040776 XD
As far as overhead is concerned, using
std::function
got me TLE once, so I stay away from it as much as possible. Similar things happen when you usestd::function
in segment tree implementations and so on.So is it safe to use it for simple stuff that only makes $$$O(n)$$$ recursive calls?
Personally speaking, I don't use
std::function
for competitive programming purposes at all, but your mileage may vary. If $$$n$$$ is not too large, $$$O(n)$$$ recursive calls might be fine.I used to use
std::function
, but because of some reasons (like not being able to set default arguments), I was told to usey_combinator
(Although we need to have some extra code to use it). What are your views on it? Should we use it?y_combinator
is just a wrapper for another way of getting something similar to the "recursive" lambda trick, so it should be fine. In most cases it will be as fast as the recursive lambdas (the only issue I know of is that sometimes those functions don't get inlined, but I have never heard of anyone getting TLE using that).I think memset is faster than the iterative assign:
memset(a, 0, n * sizeof *a);
Yes but if that's the difference between AC and TLE and your whole solution is O(N) then you can just curse the problem setter and move on :)
Using
std::fill
is much better than usingmemset
btw.The compiler will usually optimize the iterative assign to a
memset
anyway, unless there is some reason it isn't correct to do so. So it really doesn't matter.