I'm doing dynamic programming problems and I have found a weird thing that if I change the order of the for loops in the tabulation DP, one of my solution gives TLE whereas by changing the order of loops, it got accepted.
Here's the problem im doing E. Tree Queries from recent div 3 contest. In this question I'm pre calculating LCA using binary lifting but during precalculation, the change in order of loops gives tle and other got accepted.
here's the accepted version 74428570 and here's the TLE solution 74926625. The only difference is in find_ancestor function where I changes the order of the for loop.
If changing order of for loop do matter then why I got TLE instead of WA?
Thanks
Hey, the matter is that the TLE solution makes a lot of cache misses, while the AC one is very cache-efficient.
The CPU copies frequently used data from RAM to cache — faster memory located closer to the processor. It is much, much faster than RAM, but you can only fit so much data into the cache, on the order of megabytes.
Now if you access the memory at a location that is not far away from the previous access, then chances are you will find this data in the CPU cache. But if you make large jumps between memory accesses (in the TLE solution the accessed elements are at least $$$0.8$$$ megabytes apart), then the data saved in the cache likely doesn't contain what you're looking for, so you must peek into RAM which takes a lot of time.
Sometimes, swapping the dimensions reduces cache misses to such an extent that a solution gets 30x times faster — this is how important it is.
thanks mate, didn’t know this. So i can deduce that changing dimension do cause tle but it doesn’t give WA.
Well, everything that I've mentioned in my original comment is certainly true, but as I'm looking at the code once more, I'm thinking that your code goes into an infinite loop somewhere, and the TLE order of loops is actually not correct.
With this order, you can be accessing values of DP that haven't been counted: it is not guaranteed that in
dp[i][j] = dp[i-1][dp[i-1][j]];
dp[i-1][j]
is actually less or equal toj
. This is only true if every node's parent has a smaller index than the node itself.Sorry, I should've read more carefully :) Edit: with
assert(par < a);
in the DFS function, your code gives RE100.Thanks for pointing this out. Appreciate it.