[Question] Get TLE when switch the dimension of the DP array

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Hello mates, I'm trying to solve this DP question Coin Combinations II from CSES: https://cses.fi/problemset/task/1636/

Initially, I define an array dp[i][j]: meaning number of ways to form sum i, starting choosing coins from j-th index. And this solution using this DP array get me TLE.

But if I switch the two dimension, array dp[i][j]: number of ways to form sum j, starting choosing coins from i-th index. This solution give me accepted. But why?

Note:

Two solution is 99% the same the only different is that I switch two dimensions.
Sum i at most 10^6 and there are at most 10^2 coins.

Thank in advance.

Accepted code

#include<bits/stdc++.h>
using namespace std;

#define fastio() ios_base::sync_with_stdio(false);cin.tie(NULL);cout.tie(NULL)
void IN_OUT() {
#ifndef ONLINE_JUDGE
freopen("Input.txt", "r", stdin);
freopen("Output.txt", "w", stdout);
#endif
}
/*--------------------------------------------------------------------------------------------------------------------------*/

const int MOD = 1e9 + 7;

const int mxn = 1e2 + 5;
const int mxx = 1e6 + 5;
int a[mxn];
int dp[mxn][mxx]; // dp[i][j]: number of ways to form sum j, starting choosing coins from i-th index
int n, x;
void solve() {
    cin >> n >> x;
    for (int i = 0; i < n; i++) {
        cin >> a[i];
        dp[i][0] = 1;
    }
    
    for (int j = n - 1; j >= 0; j--) {
        for (int i = 1; i <= x; i++) {
            dp[j][i] = dp[j + 1][i];
            if (i >= a[j]) {
                dp[j][i] += dp[j][i - a[j]];
                dp[j][i] %= MOD;
            }
        }
    }

    cout << dp[0][x] << "\n";
}

int main() {
    fastio();
    IN_OUT();
    solve();
    return 0;
}

TLE code

#include<bits/stdc++.h>
using namespace std;
 
#define fastio() ios_base::sync_with_stdio(false);cin.tie(NULL);cout.tie(NULL)
void IN_OUT() {
#ifndef ONLINE_JUDGE
freopen("Input.txt", "r", stdin);
freopen("Output.txt", "w", stdout);
#endif
}
/*--------------------------------------------------------------------------------------------------------------------------*/
 
const int MOD = 1e9 + 7;
 
const int mxn = 1e2 + 5;
const int mxx = 1e6 + 5;
int a[mxn];
int dp[mxx][mxn]; // dp[i][j]: number of ways to form sum i, starting choosing coins from j-th index
int n, x;
void solve() {
    cin >> n >> x;
    for (int i = 0; i < n; i++) {
        cin >> a[i];
        dp[0][i] = 1;
    }
    
    for (int j = n - 1; j >= 0; j--) {
        for (int i = 1; i <= x; i++) {
            dp[i][j] = dp[i][j + 1];
            if (i >= a[j]) {
                dp[i][j] += dp[i - a[j]][j];
                dp[i][j] %= MOD;
            }
        }
    }
 
    cout << dp[x][0] << "\n";
}
 
int main() {
    fastio();
    IN_OUT();
    solve();
    return 0;
}

Comments (6)

Write comment?

lvisbl_

6 months ago, # |

Auto comment: topic has been updated by lvisbl_ (previous revision, new revision, compare).

→ Reply

_shriom_

← Rev. 3 →

In C/C++ (see here), a 2D array (or vector) will be stored in a row major format, i.e., an array arr[N][M] will be stored as N collections of M sized contiguous memory blocks. This might not be the case for other languages like Java, see here.

Now, when your program is run by the processor, and you access arr[i][j], the cache (which holds copies of the main memory for fast access by the processor) loads some contiguous blocks of memory due to their spatial locality, and thus, arr[i][j+1], arr[i][j+2], ... will be accessed much faster than say arr[i+1][j] or arr[i+2][j]. If the array size is large enough it can cause a significant difference in the efficiency of your code.

Hope this answers your query!!

akshatchaudhary

6 months ago, # ^ |

Wow! Thats cool.

Would using a 2D vector help?

← Rev. 2 →

No, even for a dynamically allocated vector, each row will be stored in a contiguous fashion, see here, so accessing arr[i][j+1] is still faster that arr[i+1][j], if arr[i][j] has been accessed.

Thank you so much for clear explanation, this is what I'm looking for.

lvisbl_'s blog