How to optimize my code (CSES Hamiltonian Flights)

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Link to the problem I use bottom up DP, where $$$dp[\text{mask of visited vertices}][\text{ending vertex}]$$$. I'm pretty sure my time complexity is $$$O(N \cdot 2^N + 2^N \cdot N^2)$$$ however it gives TLE. Are there any way to optimize my code, and what's the reason for TLE (probably big constant factor?)

My code

#include <bits/stdc++.h>
using namespace std;

const int M = 1e9 + 7;

const int N = 20;

int dp[1 << N][N];

int main() {
	ios::sync_with_stdio(false);
	cin.tie(0);
	int n, m;
	cin >> n >> m;
	vector<tuple<int, int, int>> edges(m);
	for(int i = 0; i < m; ++i) {
		int u, v;
		cin >> u >> v;
		--u, --v;
		edges[i] = {u, v, (1 << u) | (1 << v)};
	}
	vector<pair<int, int>> masks;
	for(int i = 0; i < (1 << n); ++i) {
		int j = __builtin_popcount(i);
		if(j >= 2) {
			masks.emplace_back(j, i);
		}
	}
	sort(masks.begin(), masks.end());
	dp[1][0] = 1;
	for(int id = 0; id < (int) masks.size(); ++id) {
		int mask = masks[id].second;
		for(auto& [u, v, c] : edges) {
			if((mask & c) == c) {
				dp[mask][v] = (dp[mask][v] + dp[mask ^ (1 << v)][u]) % M;
			}
		}
	}
	cout << dp[(1 << n) - 1][n - 1] << "\n";
	return 0;
}

Comments (8)

Show archived | Write comment?

Alon-Tanay

2 years ago, # |

Your dp idea could work, if you eliminate the N^2 factor. Try using a top-bottom approach, so start with the full mask and ending, and work to the bottom (recursion should be an easy implementation). don't forget to use memoization in that case, since you don't want to compute the same value more than once.

→ Reply

Polyn0mial

2 years ago, # ^ |

Is it because bottom-up approach checked to many unimportant states?

Scratch that, looking at my submission I think I'm using N^2*2^N as well, the problem is probably you sorting the masks.

Also quick tip for speeding up your code by about x5 is only doing dp on the middle nodes (2,3,...,n-1) because you know for sure when you'll reach 1 and n.

Sorting is only $$$O(N \cdot 2^N)$$$ which is way smaller than $$$O(2^N \cdot N^2)$$$. Also, only do dp on the middle nodes like this $$$dp[\text{first_node}][\text{visited_nodes}][\text{ending_node}]$$$ ? However it requires $$$O(2^N \cdot N^2)$$$ memory which is too much.

no no, do the same dp, but only with middle nodes, don't do it for 1 and n

Not sure how. If we only do the middle ones, what will be the final answer? Some of the $$$\text{full_mask}$$$ could be starting from some nodes that isn't directly connected to node $$$1$$$.

the only difference is the starting values, which will be 1 for each direct neighbor of node 1 (for the 1 bit mask).

The calculation at the end will be the sum of full masks over all nodes that connect to n

Polyn0mial's blog