Hello everyone! Today, I encountered a very strange problem with question e1. I have one submission which is this: [link] And another submission which is this: [link] The only difference between the two is in line 123 and it's a very small difference, but the first one is accepted in 1.8 seconds, and the second one in 6 seconds. There is a huge difference. I wanted to know if anyone knows the reason for this big difference; it doesn't make any sense at all. That line executes about 100,000,000 times. I know it's a lot, but it doesn't make sense to me that this change would add 4 seconds to my code. I want to know if there could be another reason or if that's really all there is to it.
Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).
This looks like cache misses. Accessing
dp[u][v][j]
for sequentialj
is cheap. Accessingdp[u][alle[j].u][j], dp[alle[j].v][v][j], dp[u][alle[j].v][j], dp[alle[j].u][v][j]
is much more expensive since they're at various places in memory and those places can only be known at runtime, so no optimisation is possible.An interesting thing happened that I thought I'd share. I had a submission, this link: [link] Instead of taking the minimum, I used || and saw no improvement in time. But then I used:
and it Reduced the time by about 1.5 seconds. It's strange because it definitely couldn't vectorize, as you mentioned, it is determined by the inputs and can't optimize this.
You can check assembly with
-fverbose-asm
. O3 can sometimes make very aggressive, bad optimisations, and it often results in worse runtime from what I found out when I tried to use it all the time.Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).