Time complexity of string.substr()

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

According to https://cplusplus.com/reference/string/string/substr/ Complexity = Unspecified, but generally linear in the length of the returned object.

However, I believe in practice it's much faster, specially for repeated calls with same start_pos.

Example problem: https://leetcode.com/contest/weekly-contest-377/problems/minimum-cost-to-convert-string-ii/

Solution from contest winner below

Solution

const int N = 300;
const long long INF = 0x3F3F3F3F3F3F3FLL;

long long dis[N][N];

class Solution {
public:
    long long minimumCost(string source, string target, vector<string>& original, vector<string>& changed, vector<int>& cost) {
        map<string, int> label;
        for (auto& v : original) {
            label[v];
        }
        for (auto& v : changed) {
            label[v];
        }
        int total = 0;
        for (auto& it : label) {
            it.second = total++;
        }
        
        for (int i = 0; i < total; ++i) {
            for (int j = 0; j < total; ++j) {
                dis[i][j] = INF;
            }
            dis[i][i] = 0;
        }
        
        for (int i = 0; i < original.size(); ++i) {
            int u = label[original[i]];
            int v = label[changed[i]];
            
            dis[u][v] = min(dis[u][v], (long long) cost[i]);
        }
        for (int k = 0; k < total; ++k) {
            for (int i = 0;i < total; ++i) {
                for (int j = 0; j < total; ++j) {
                    if (i != j && j != k && k != i) {
                        dis[i][j] = min(dis[i][j], dis[i][k] + dis[k][j]);
                    }
                }
            }
        }
        
        vector<int> lens;
        for (auto& v : original) {
            lens.push_back(v.size());
        }
        sort(lens.begin(), lens.end());
        lens.erase(unique(lens.begin(), lens.end()), lens.end());
        
        int n = source.size();
        vector<long long> dp(n + 1, INF);
        dp[0] = 0;
        
        for (int i = 0; i < n; ++i) {
            if (source[i] == target[i]) {
                dp[i + 1] = min(dp[i + 1], dp[i]);
            }
            for (int l : lens) {
                if (i - l + 1 >= 0) {
                    string s = source.substr(i - l + 1, l);
                    string t = target.substr(i - l + 1, l);
                    auto u = label.find(s);
                    auto v = label.find(t);
                    if (u != label.end() && v != label.end()) {
                        dp[i + 1] = min(dp[i + 1], dp[i - l + 1] + dis[u->second][v->second]);
                    }
                }
            }
        }
        if (dp[n] >= INF) {
            return -1;
        }
        return dp[n];
    }
};

My analysis of the time complexity for the code above: I think substr() call should result in timeout. STL says complexity of substr(x, len) = len. Therefore, the dp loop is n * lens.size() * max_len where, n = source.size(), and max_len = max(lens[i]) for all i.

Eg. in the case where n = 1000, and we have lens = [900, 901, ..., 999]. Therefore,

Outer loop > for (int i = 0; i < n; ++i) n = 1000,
Inner loop > for (int l : lens), lens = [900, 901, ..., 999]
Inside inner loop. we call substr(st, l), in O(l). But max(l) = n

Thus, since max(l) = max_len = 999,

Time Complexity = n * lens.size() * max_len
Time Complexity = n * lens.size() * n
Time Complexity = 1000*100*1000, which should TLE

There must be something going on making substr() more efficient. My guess is caching susbtr() calls so substr(i, x+d), uses previously queried substr(i, x),

Would love to understand more about the optimization going on in substr(). Or would this solution always give TLE for this test case, indicating that it could be hacked (even if not supported in Leetcode)?

Only thing I found is from https://stackoverflow.com/questions/4679746/time-complexity-of-javas-substring

stackoverflow

Comments (4)

Write comment?

Jomax100

12 months ago, # |

Auto comment: topic has been updated by Jomax100 (previous revision, new revision, compare).

→ Reply

vgtcross

$$$1000\cdot100\cdot1000=10^8$$$, which doesn't necessarily TLE.
substr(i, len) creates a new copy with length len — that's impossible to do in $$$o(\mathrm{len})$$$.

12 months ago, # ^ |

← Rev. 2 →

What about replacing

       for(int len: lens){
                if(i+len > n) break;
                string cur = source.substr(i, len);
                string need = target.substr(i, len);

with the following

       string s, t;
       for(int len = 1; len <= lens.back(); len++){
                if(i+len > n) break;
                s += source[i+len-1];
                t += target[i+len-1];

Previously inner loop = lens * n = 100*1000

Now inner loop might be n since max(len) <= n

However, inside the inner loop, we compute substr() in O(1), and the dominant term is the O(log(200)) call to label.find(s) in the map<string,int>.

So inner loop = n * log(200)

Final ans = n*n*log(200) = 1000*1000*8

But that does TLE

Maybe label.find(s) has it's time complexity log(label.size()) + some cost related to s.size()?

Actually tried replacing substr() with manual implementation and, while slower, still got Accepted:

Before

string cur = source.substr(i, cur_size);
string need = target.substr(i, cur_size);

After:

forn(j,cur_size) cur += source[i+j];
forn(j,cur_size) need += target[i+j];

So I guess it must be weak test cases.

Jomax100's blog