Блог пользователя ExpectoPatronum

Автор ExpectoPatronum, история, 3 года назад, По-английски
ll res = -oo;
    for(int k=1; k<=n; k++)
        for(int i=1; i<=n-k+1; i++)
            for(int j=1; j<=n-k+1; j++)
            {
                f[i][j][k] = f[i][j][k - 1] + a[i + k - 1] * b[j + k - 1];
                res = max(res, f[i][j][k]);
            }
    cout << res;
ll res = -oo;
    for(int i=1; i<=n; i++)
        for(int j=1; j<=n; j++)
            for(int k=1; k<=n; k++)
            {
                if(k + i > n || k + j > n) break;
                f[i][j][k] = f[i][j][k - 1] + a[i + k - 1] * b[j + k - 1];
                res = max(res, f[i][j][k]);
            }
    cout << res;

These code seem not to be differ too much but execution time have much differ. Example in case n = 500, code1 run in 3103ms but code 2 run only 789ms. What is the reason ? Pls explain to me. Thanks.

  • Проголосовать: нравится
  • +20
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

From my little understanding, arrays are arranged row order in continuos blocks of memory. i.e. the first row is in a block of memory, then the second row immediately continues from where the first one left off and so on...

This makes it easier to go to the next row in a loop that is ordered row by row by just going to the next position in memory eveytime making it cache friendly. If ordered the other way around, it takes more time as you have to continuosly make jumps in memory position when switching to different rows which isn't cache friendly

»
3 года назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

Hey, can you please put the codes inside the block

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by ExpectoPatronum (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

Auto comment: topic has been updated by ExpectoPatronum (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

pro vjp idol

»
3 года назад, # |
Rev. 4   Проголосовать: нравится +5 Проголосовать: не нравится

Because of how memory works
Whenever you read 4 bytes, you actually load 64 bytes into cpu caches and basically read other 15 values for free (MUCH cheaper than from memory) if you read contigious ranges of memory

But if you random jump, you lose advantages of caches, because they're limited and always filled with data you currently don't need and you have to read each value from memory directly