amirhoseinfar1385's blog

By amirhoseinfar1385, history, 3 days ago, In English

Hello everyone! Today, I encountered a very strange problem with question e1. I have one submission which is this: [link] And another submission which is this: [link] The only difference between the two is in line 123 and it's a very small difference, but the first one is accepted in 1.8 seconds, and the second one in 6 seconds. There is a huge difference. I wanted to know if anyone knows the reason for this big difference; it doesn't make any sense at all. That line executes about 100,000,000 times. I know it's a lot, but it doesn't make sense to me that this change would add 4 seconds to my code. I want to know if there could be another reason or if that's really all there is to it.

»
3 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).

»
3 days ago, # |
  Vote: I like it 0 Vote: I do not like it

This looks like cache misses. Accessing dp[u][v][j] for sequential j is cheap. Accessing dp[u][alle[j].u][j], dp[alle[j].v][v][j], dp[u][alle[j].v][j], dp[alle[j].u][v][j] is much more expensive since they're at various places in memory and those places can only be known at runtime, so no optimisation is possible.

  • »
    »
    3 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    An interesting thing happened that I thought I'd share. I had a submission, this link: [link] Instead of taking the minimum, I used || and saw no improvement in time. But then I used:

    #pragma GCC optimize("O3,unroll-loops")
    #pragma GCC target("avx2")
    

    and it Reduced the time by about 1.5 seconds. It's strange because it definitely couldn't vectorize, as you mentioned, it is determined by the inputs and can't optimize this.

    • »
      »
      »
      3 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      You can check assembly with -fverbose-asm. O3 can sometimes make very aggressive, bad optimisations, and it often results in worse runtime from what I found out when I tried to use it all the time.

»
3 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).