The TeamsCode Summer 2024 Virtual Programming Contest was held on Sunday, July 28. The problems were written and prepared by Yam, hyforces, oursaco, willy108, jay_jayjay, HaccerKat, xug, yash_9a3b, Red0, and superhelen. The problems (with problem credits) can be found in the Novice Gym and Advanced Gym. lunchbox solved all the problems during testing and provided valuable feedback.

# Novice A / Advanced A: P!=NP

**Hint**

Try simplifying the equations by dividing by $$$p$$$, then think about which values of $$$p$$$ would produce a valid value of $$$n$$$.

**Solution**

Dividing both sides of the equation $$$p! = n \cdot p$$$ by $$$p$$$, we get $$$(p-1)! = n$$$. Thus, each value of $$$p$$$ corresponds to exactly one value of $$$n$$$. We can subtract $$$p$$$ from both sides of $$$p \neq n \cdot p$$$ to get $$$0 \neq n \cdot p - p$$$, then factor to get that $$$p \cdot (n-1) \neq 0$$$. This means that the only times there is not a valid value of $$$n$$$ for $$$p$$$ is when $$$n=1$$$ or $$$p=0$$$. $$$n=1$$$ when $$$(p-1)!=0$$$, which happens when $$$p$$$ is $$$1$$$ or $$$2$$$. We can conclude that there exists one valid $$$n$$$ for every $$$p$$$ such that $$$3 \le p \le P$$$, giving our final answer of $$$\max(0, P-2)$$$.

**Code (C++)**

```
#include <bits/stdc++.h>
using namespace std;
int main() {
int p;
cin >> p;
cout << max(0, p-2);
}
```

**Code (Python)**

```
P = int(input())
print(max(0,P-2))
```

# Novice B: Ifrit Tile 2

**Solution**

Since both $$$n$$$ and $$$m$$$ are small, we can use brute force to find out of a tile is an Ifrit Tile. We will first enumerate the direction, then we will count the number of low ground times in that direction. This is $$$4 \cdot 5 = 20$$$ operations for each cell, which easily fits within the time constraints.

The final complexity is $$$O(nm)$$$.

**Code**

```
#include <iostream>
#include <string>
#include <cmath>
#include <utility>
#include <cassert>
#include <algorithm>
#include <vector>
#define sz(x) ((int)(x.size()))
#define all(x) x.begin(), x.end()
#define pb push_back
const int MX = 1000 +10;
using namespace std;
int n, m;
char grid[MX][MX];
bool check(int x, int y, int dx, int dy){
if(grid[x][y] != 'H') return 0;
int ans = 0;
for(int i = 1; i<6; i++){
x += dx;
y += dy;
if(x < 0 || y < 0 || x >= n || y >= m) break ;
ans += (grid[x][y] == 'L');
}
return ans >= 2;
}
int main(){
cin.tie(0) -> sync_with_stdio(0);
cin >> n >> m;
for(int i = 0; i<n; i++){
string s;
cin >> s;
for(int j = 0; j<m; j++){
grid[i][j] = s[j];
}
}
int ans = 0;
for(int i = 0; i<n; i++){
for(int j = 0; j<m; j++){
int work = 0;
for(int dx : {-1, 0, 1}){
for(int dy : {-1, 0, 1}){
if(abs(dx) + abs(dy) != 1) continue;
work |= check(i, j, dx, dy);
}
}
ans += work;
}
}
cout << ans << "\n";
}
```

# Novice C: Phonier

**Hint**

$$$a \oplus a = 0$$$ for all $$$a$$$.

**Solution**

For any query, if $$$i \neq j$$$, then the sum $$$(a_i + a_j)$$$ will be in the final bitwise XOR as $$$(a_i + a_j)$$$ and $$$(a_j + a_i)$$$. These two occurrences will cancel out to $$$0$$$. Therefore, we only need to find $$$\bigoplus\limits_{i=l}^{r} (a_i + a_i)$$$.

This can be done with prefix sums with the bitwise XOR operation. The final complexity is $$$O(n + q)$$$.

**Code (C++)**

```
#include <iostream>
#include <vector>
using namespace std;
int main(){
cin.tie(0) -> sync_with_stdio(0);
int n, q;
cin >> n >> q;
vector<int> arr(n);
vector<int> pre(n+1, 0);
for(int i = 0; i<n; i++){
cin >> arr[i];
pre[i+1] = pre[i] ^ (arr[i] + arr[i]);
}
for(int i = 0; i<q; i++){
int l, r;
cin >> l >> r;
cout << (pre[l-1]^pre[r]) << "\n";
}
}
```

**Code (Python)**

```
n, m = map(int, input().split())
a = [*map(int, input().split())]
bits = [1<<b for b in range(31)]
psums = [[0 for j in range(n+1)] for i in range(30)]
for b in range(30):
for i in range(n):
psums[b][i+1] = psums[b][i] + (a[i]&bits[b] > 0)
res = []
for q in range(m):
l, r = map(int, input().split())
res.append(str(sum(bits[b+1] for b in range(30) if (psums[b][r]-psums[b][l-1]) & 1)))
print('\n'.join(res))
```

# Novice D: Parallel Arrays

**Hint 1**

If $$$a$$$ and $$$b$$$ are increasing, the possible values of $$$a_i$$$ and $$$b_i$$$ we can choose at each index are heavily limited.

**Hint 2**

If we iterate left to right from index $$$1$$$ to $$$n$$$, out of the values that have not been added to $$$a$$$ or $$$b$$$ yet, we must add the minimum value to either $$$a$$$ or $$$b$$$.

**Solution**

Since $$$a$$$ and $$$b$$$ are increasing, for any any index $$$i$$$ ($$$1 \le i \le n$$$), the smaller of $$$a_i$$$ or $$$b_i$$$ must be the minimum element of $$$[1,2,\ldots,2 \cdot n]$$$ that is not in $$$a_1,a_2,\ldots,a_{i-1}$$$ and $$$b_1,b_2,\ldots,b_{i-1}$$$. Notice that this value is equal to $$$1$$$ for $$$i=1$$$ and can be found for $$$2 \le i \le n$$$ by keeping track of what values have been used so far and incrementing the minimum unused value. Since we know $$$\min(a_i,b_i)$$$, we can easily find $$$\max(a_i,b_i)$$$ because we are provided one of $$$a_i+b_i$$$, $$$|a_i-b_i|$$$, and $$$\max(a_i,b_i)$$$. Thus, for each $$$i$$$, we only need to determine if $$$a_i$$$ must be less than $$$b_i$$$, $$$a_i$$$ must be greater than $$$b_i$$$, or both can form increasing $$$a$$$ and $$$b$$$. If it doesn't matter which of $$$a_i$$$ and $$$b_i$$$ is larger, we multiply our answer by $$$2$$$ since there are two different ways to assign values to $$$a_i$$$ and $$$b_i$$$. We can do this by iterating from left to right and keeping track of the latest values in $$$a$$$ and $$$b$$$ for an $$$O(n)$$$ solution.

**Code (C++)**

```
#include <bits/stdc++.h>
using namespace std;
int main() {
int n; cin >> n;
long long ans=1;
int mod=1e9+7;
vector<bool> used(2*n+1, 0);
int maxa=0;
int maxb=0;
int min_unused=1;
for (int i=0; i<n; i++) {
if (min_unused>maxa&&min_unused>maxb) {
ans*=2;
ans%=mod;
}
int t, v; cin >> t >> v;
if (t==1) v-=min_unused;
else if (t==2) v+=min_unused;
maxa=min_unused; // store the smaller in maxa
maxb=v; // store the larger in maxb
used[min_unused]=true;
used[v]=true;
while (used[min_unused]) min_unused++;
}
cout << ans;
}
```

**Code (Python)**

```
MOD = int(1e9+7)
n = int(input())
vis = [0 for i in range(3*n)]
l = 1
prs = []
for q in range(n):
t, v = map(int, input().split())
r = v-l if t==1 else l+v if t==2 else v
prs.append((l, r))
vis[l] = vis[r] = 1
while vis[l]: l += 1
res = 2
for p1, p2 in zip(prs, prs[1:]):
if min(p2) > max(p1): res = (res * 2) % MOD
print(res)
```

# Novice E: Minimize Sum

**Hint 1**

If the $$$k$$$-th bit of $$$a_i$$$ and the $$$k$$$-th bit of $$$a_j$$$ are both $$$0$$$, adding them and subtracting their bitwise XOR yields $$$0+0-0 \oplus 0=0$$$. If one of the two bits are $$$1$$$ and the other is $$$0$$$, we still get $$$1+0-1 \oplus 0=1-1=0$$$. What happens if the bits are both $$$1$$$?

**Hint 2**

If the $$$k$$$-th bit of $$$a_i$$$ and the $$$k$$$-th bit of $$$a_j$$$ are both $$$1$$$, adding them and subtracting their bitwise XOR yields $$$1+1-1 \oplus 1=2-0=2$$$. This tells us that $$$a_i + a_j - a_i \oplus a_j = 2 \cdot (a_i \text{&} a_j)$$$, where $$$\text{&}$$$ denotes the bitwise AND operation.

**Solution**

If we choose two elements $$$a_i$$$ and $$$a_j$$$, we are subtracting $$$a_i + a_j - a_i \oplus a_j$$$ from the original sum of $$$a$$$. We want to maximize $$$a_i + a_j - a_i \oplus a_j$$$. We can observe that $$$a_i + a_j - a_i \oplus a_j$$$ yields $$$2$$$ at a certain bit if the corresponding bits of $$$a_i$$$ and $$$a_j$$$ are $$$1$$$, otherwise it's $$$0$$$. This means $$$a_i + a_j - a_i \oplus a_j = 2 \cdot (a_i \text{&} a_j)$$$, where $$$\text{&}$$$ denotes the bitwise AND operation. To maximize $$$2 \cdot (a_i \text{&} a_j)$$$, we iterate through each bit, starting from higher powers of $$$2$$$, keeping track of if there are at least $$$2$$$ elements of $$$a$$$ that have a $$$1$$$ at the current bit that are also optimal for the previous bits. We subtract this maximum value from our initial sum to get our final answer. This solution requires iterating through at most $$$n$$$ elements for each of the $$$\lceil \log_2(10^9) \rceil=30$$$ bits, giving a time complexity of $$$O(n)$$$.

**Code (C++)**

```
#include <bits/stdc++.h>
using namespace std;
int main() {
int n; cin >> n;
vector<int> a(n);
vector<bool> good(n);
long long sum=0;
for (int i=0; i<n; i++) {
cin >> a[i];
good[i]=true;
sum+=a[i];
}
for (int shift=29; shift>=0; shift--) {
int count=0;
for (int i=0; i<n; i++) {
if (good[i] && (a[i]>>shift)%2) {
count++;
}
}
if (count>=2) {
for (int i=0; i<n; i++) {
if (good[i] && (a[i]>>shift)%2==0) {
good[i]=false;
}
}
}
}
long long ans=(2>>30)-1;
int count=0;
for (int i=0; i<n; i++) {
if (good[i]) {
ans&=a[i];
count++;
}
if (count==2) {
break;
}
}
cout << (sum-ans*2);
}
```

**Code (Python)**

```
t = int(input())
while t:
n = int(input())
a = [*map(int, input().split())]
bset = [{x for x in a if x&(1<<b)} for b in range(32)]
res = {*a}
for st in bset[::-1]:
if len(res&st) > 1:
res &= st
x, y = [*res][:2]
res = sum(a) - x - y + (x^y)
vcnt = {x:0 for x in a}
for x in a:
if vcnt[x] == 1: res = min(res, sum(a)-x-x+(x^x))
vcnt[x] += 1
print(res)
t -= 1
```

# Novice F: XOR Game

**Hint 1**

Recursion is too slow... What else can we do?

**Hint 2**

Maybe Grid DP?

**Hint 3**

What do you need in order to reach $$$(i, j)$$$?

**Solution**

Let's consider a dynamic programming (DP) solution. Have $$$ \text{dp}[i][j]$$$ represent the minimum number of flips to reach row $$$i$$$ and column $$$j$$$.

For the transitions, we have two cases (please note that this will depend based on your implementation. I find that implementing this way is the neatest):

- Transition from $$$ \text{dp}[i][j-1]$$$ to $$$ \text{dp}[i][j]$$$ (Go right)
- Transition from $$$ \text{dp}[i][j]$$$ to $$$ \text{dp}[i+1][j]$$$ (Step down)

First for case 1, we can notice that if we are in a row $$$i$$$ and column $$$j$$$, we can only go to column $$$k$$$ where all values between columns $$$j$$$ and $$$k$$$ are $$$1$$$s, so we can simply iterate from $$$j = 2 \ldots m$$$ while checking for whether $$$ \text{dp}[i][j]$$$ is reachable from $$$ \text{dp}[i][j-1]$$$ or not. If yes, set $$$ \text{dp}[i][j] = min( \text{dp}[i][j], \text{dp}[i][j-1])$$$.

For case 2, we use a similar idea to case 1, except now, we see whether or not we need to flip the row below. Therefore the respective transitions are $$$ \text{dp}[i+1][j] = min( \text{dp}[i+1][j], \text{dp}[i][j]+1)$$$ and $$$ \text{dp}[i+1][j] = min( \text{dp}[i+1][j], \text{dp}[i][j])$$$.

In the end, our code runs in $$$O(n \cdot m)$$$ time, which in the given constraints is good enough to pass.

**Code**

```
#include <bits/stdc++.h>
using namespace std;
int main() {
int t;
cin >> t;
while(t--) {
int n, m;
cin >> n >> m;
vector<vector<int>> a(n, vector<int>(m));
for(int i = 0; i < n; ++i) {
string s;
cin >> s;
for(int j = 0; j < m; ++j) {
a[i][j] = s[j]-'0';
}
}
vector<vector<int>> dp(n, vector<int>(m, -1));
dp[0][0] = 0;
for(int i = 0; i < n; ++i) {
for(int j = 1; j < m; ++j) {
int cur = dp[i][j], prev = dp[i][j-1];
if ((prev != -1) && (a[i][j] == a[i][j-1]) && ((cur == -1) || (prev < cur))) {
dp[i][j]=prev;
}
}
if(i < n-1) {
for(int j = 0; j < m; ++j) {
int cur = dp[i][j], next = dp[i+1][j];
bool need_flip = (a[i+1][j] == 0);
if ((cur != -1) && ((next == -1) || ((cur+need_flip) < next))) {
dp[i+1][j] = cur+need_flip;
}
}
}
}
cout << dp[n-1][m-1] << "\n";
}
return 0;
}
```

# Novice G / Advanced B: Monkey Arrays

**Hint**

For each possible starting index of a subarray, which ending indices will produce a *good* subarray?

**Solution**

If we fix the starting index of a subarray as $$$l$$$, for the subarray $$$a[l,l+1,\ldots,r-1,r]$$$ to be *good*, the ending index $$$r$$$ must be at or after the first index $$$i \ge l$$$ such that $$$a[i]=x$$$. $$$r$$$ must also be at or after the first index $$$i \ge l$$$ such that $$$a[i]=y$$$. Finally, $$$r$$$ must also be before the first index $$$i \ge l$$$ such that $$$a[i]>x$$$, $$$a[i]<y$$$, or $$$a[i]=k$$$.

This means for each value of $$$l$$$, we can find the smallest and largest possible value for $$$r$$$ that produces a valid subarray. We then simply add the range of valid values for $$$r$$$ that this covers to our answer. We can do this efficiently for each $$$l$$$ by precomputing the first index that has value $$$x$$$, the first index that has value $$$y$$$, and the first index that has value $$$>x$$$ or $$$<y$$$ or $$$=k$$$ after each index by looping over the array backwards, giving us an $$$O(n)$$$ solution.

**Code**

```
#include <bits/stdc++.h>
using namespace std;
int main() {
int t; cin >> t;
while (t--) {
int n, x, y, k; cin >> n >> x >> y >> k;
int arr[n];
for (int i = 0; i < n; i++) cin >> arr[i];
int nextbad[n], nextx[n], nexty[n];
int nbad = n, nx = n, ny = n;
for (int i = n-1; i >= 0; i--) {
if (arr[i] < y || arr[i] > x || arr[i] == k) nbad = i;
else if (arr[i] == y) ny = i;
else if (arr[i] == x) nx = i;
nextbad[i] = nbad;
nextx[i] = nx;
nexty[i] = ny;
}
long long ans = 0;
for (int i = 0; i < n; i++) {
int l = max(nextx[i], nexty[i]);
int r = nextbad[i];
ans += (long long)max(r-l, 0);
}
cout << ans << endl;
}
}
```

# Novice H: Digit Removal

**Hint 1**

Consider a dynamic programming approach where we have states corresponding to the number of digits deleted so far and how far we have gotten in $$$a$$$.

**Hint 2**

Consider an approach where we store the number of ways to have deleted exactly $$$i$$$ digits from the number consisting of the first $$$j$$$ digits of $$$a$$$, such that the resulting number is equal to the number consisting of the first $$$j-i$$$ digits of $$$b$$$.

**Solution**

Let us initialize an array $$$\text{ways}$$$ where $$$\text{ways}[i]$$$ stores the number of ways to have deleted $$$i$$$ digits from the current prefix of $$$a$$$ such that the resulting number is equal to the corresponding prefix of $$$b$$$. We can maintain this as we iterate over the digits of $$$a$$$ from left to right. As we iterate through the digits, for each possible amount of already deleted digits, if $$$a$$$ can be made less than $$$b$$$, we add the number of ways to have gotten to this point multiplied by the number of ways to choose the remaining digits to be deleted. We also update $$$\text{ways}$$$ by adding possibilities for what happens if we can keep the prefixes of $$$a$$$ and $$$b$$$ equal by either deleting the current digit or not. Iterating over the digits gives us an $$$O(n)$$$ time complexity.

**Code**

```
#include <bits/stdc++.h>
using namespace std;
int main() {
int tc; cin >> tc;
while (tc--) {
string a, b; cin >> a >> b;
int n=a.size();
long long ans=0;
vector<long long> ways(5, 0);
ways[0]=1;
for (int i=0; i<n; i++) {
for (int j=4; j>=0; j--) {
if (ways[j]>0) {
if (b[i-j]>a[i]) {
if (4-j<=n-i-1) {
long long count=ways[j];
for (int x=1; x<=4-j; x++) {
count*=n-i-x;
}
for (int x=2; x<=4-j; x++) {
count/=x;
}
ans+=count;
}
}
if (j!=4) {
ways[j+1]+=ways[j];
}
if (b[i-j]!=a[i]) {
ways[j]=0;
}
}
}
}
cout << ans << "\n";
}
}
```

# Novice I / Advanced C: Monkey Math Tree

**Hint 1**

A tree (or forest) with $$$n$$$ nodes and $$$m$$$ edges has $$$n - m$$$ connected components.

**Hint 2**

Linearity of expectation gives that $$$\text{EV}[cc] = \text{EV}[n - m] = \text{EV}[n] - \text{EV}[m]$$$ where $$$\text{EV}[x]$$$ denotes the expected value of $$$x$$$.

**Solution**

Read the two hints.

$$$\text{EV}[cc] = \text{EV}[n - m] = \text{EV}[n] - \text{EV}[m]$$$

By linearity of expectation again, we can consider the expected number of nodes that remain as the sum of the $$$\text{EV}$$$ for each individual node. For a given node $$$i$$$, it's $$$\text{EV}$$$ is $$$1 \cdot \frac{1}{i}$$$. Therefore, $$$\text{EV}[n] = \sum_{i = 1}^{n} \frac{1}{i}$$$.

Similarly, we can also consider $$$\text{EV}[m]$$$ as the $$$\text{EV}$$$ of each edge summed up. For the edge that connects nodes $$$i$$$ and $$$i+1$$$, the probability that it stays after all the removals is $$$\frac{1}{i} \cdot \frac{1}{i+1}$$$ as both of its endpoints must stay. This means the $$$\text{EV}$$$ for that edge is $$$1 \cdot \frac{1}{i} \cdot \frac{1}{i+1}$$$. Therefore, $$$\text{EV}[m] = \sum_{i = 1}^{n-1} \frac{1}{i(i+1)}$$$.

Subtracting $$$\text{EV}[m]$$$ from $$$\text{EV}[n]$$$ will give us our final answer. It is too slow to compute $$$\text{EV}[n]$$$ and $$$\text{EV}[m]$$$ per test case, so precomputation and prefix sums is recommended. The final complexity is $$$O(n \log \text{MOD})$$$ or $$$O(n)$$$ as either binary exponentiation or modular inverse precomputation is needed to find $$$\frac{1}{i}$$$ for all $$$i \leq n$$$.

**Code**

```
#include <bits/stdc++.h>
using namespace std;
const int MAXN = int(1e6) + 1, MOD = 1e9 + 7;
int vv[MAXN], f[MAXN];
void init() {
vv[1] = 1;
for (int i = 2; i < MAXN; i++) {
//vv[i] stores the inverse of i mod MOD
vv[i] = (long long) vv[MOD % i] * (MOD - MOD / i) % MOD;
}
for (int i = 1; i < MAXN; i++) {
//f[i] stores the answer for i
f[i] = (f[i - 1] + vv[i] - (long long) vv[i] * vv[i - 1] % MOD) % MOD;
if (f[i] < 0) {
f[i] += MOD;
}
}
}
int main() {
ios::sync_with_stdio(0);
cin.tie(0);
init();
int T;
cin >> T;
while (T--) {
int N;
cin >> N;
cout << f[N] << '\n';
}
}
```

# Novice J / Advanced D: Kawaii the Rinbot

**Hint 1**

The text file has $$$10600$$$ lines, and the submission limit is $$$50000$$$, giving you around $$$4 - 5$$$ characters per title.

**Hint 2**

How many unique values can you represent with a string of length $$$4$$$?

**Hint 3**

Is there an easy $$$1$$$ to $$$1$$$ matching of titles to strings of length $$$4$$$ that can be decoded easily as well?

**Solution**

Polynomial hash each string modulo $$$95^4$$$, this gives an approximately random matching of titles to strings. $$$95^4 \approx 10^8 \approx 10600^2$$$. Even with Birthday Paradox, the expected number of collisions is very small. You can add an extra if statement in the case of a collision (or try to find another base).

This gives an approximately $$$1$$$ to $$$1$$$ matching of titles to strings of length $$$4$$$, but how do we decode them? Concatenate all the strings in order of title by the text file. Then whenever there is a query, we can determine at which index in the big string our input string's hash occurs at. To account for the test cases, we can use a map to store the mapping of strings of length $$$4$$$ to index.

This gives a solution that uses $$$ < 50000$$$ characters (the model is $$$47000$$$ characters) and has an $$$O(\log{\text{# of title}})$$$ time per test case.

# Novice K / Advanced E: Waymo orzorzorz

**Hint 1**

You should always copy first, then paste.

**Hint 2**

Consider fixing the number of copies and pastes.

**Hint 3**

The number of combinations of copies and pastes is small.

**Solution**

First observe that you always type $$$x$$$ single orz's, and then copy-paste to multiply $$$x$$$ by $$$y$$$ such that $$$x \cdot y \ge N$$$. Consider fixing the number of copy and paste events: clearly, a copy should be accompanied by a paste, so there should be at most $$$\log_2N$$$ copies. Then, given that you have $$$c$$$ copies, you have at most $$$c \cdot N^{1/c}$$$ pastes (you do a series of a copy, then $$$N^{1/c}$$$ pastes, then repeat).

Then, the solution is as follows: Fix $$$c$$$. If $$$c\ge 2$$$, you simply run iterate over the number of pastes. If $$$c=1$$$, then $$$x=\left\lceil{\frac{N}{y}}\right\rceil$$$, and it is well known that there are $$$\mathcal{O}(\sqrt N)$$$ distinct $$$x$$$'s, so you can brute force. And the $$$c=0$$$ case is trivial.

The complexity is dominated by the small $$$c$$$ terms, and the larger $$$c$$$ terms run in $$$\mathcal{O}(\log N)$$$, so the solution runs in $$$\mathcal{O}(\sqrt N + \log^2 N)$$$.

**Code**

```
// Yam's code
#include<bits/stdc++.h>
using namespace std;
using ll = long long;
#define all(x) (x).begin(), (x).end()
const int B = 3, L = 41;
void solve() {
ll a, b, c, n;
cin >> a >> b >> c >> n;
vector<ll> v;
ll x = 1;
while (x < n) {
v.push_back(x);
ll c = (n + x - 1) / x;
x = (n - 1) / (c - 1) + 1;
}
ll ans = n * a;
for (ll o : v) {
ll need = (n + o - 1) / o;
for (int p = 1; p < B; p++) {
ll lo = p, hi = 1e9;
while (lo < hi) {
ll mid = (lo + hi) / 2;
ll mn = mid / p + 1, rem = mid % p;
ll prod = 1;
for (int i = 0; i < p; i++) {
prod *= mn + (i < rem);
if (prod >= need)
break;
}
if (prod >= need)
hi = mid;
else
lo = mid + 1;
}
ans = min(ans, a * o + b * p + c * lo);
}
}
for (int p = B; p < L; p++) {
ll prod = 1, cnt = 0;
ll cur = 1, rem = p;
while (prod <= n) {
ll o = (n + prod - 1) / prod;
ans = min(ans, a * o + b * p + c * cnt);
prod += prod / cur;
cnt++;
rem--;
if (rem == 0)
rem = p, cur++;
}
ans = min(ans, a + b * p + c * cnt);
}
cout << ans << '\n';
}
int main() {
cin.tie(0)->sync_with_stdio(false);
int t;
cin >> t;
while (t--)
solve();
return 0;
}
```

# Advanced F: Stage 4

**Hint 1**

We will have to maintain a set of possible values $$$x$$$ at each step $$$i$$$.

**Hint 2**

What is the upper bound on the biggest element we have to store in the set?

**Solution**

At each index $$$i$$$, let the set $$$S_i$$$ contain all possible reachable values $$$x$$$ after processing the prefix $$$p_1, p_2, \ldots, p_i$$$. Initially, $$$S_0$$$ only contains $$$x_0$$$, the initial given value of $$$x$$$.

The transition to get $$$S_i$$$ is

Then the answer for the query $$$i$$$ is simply the number of elements $$$x \mid S_i$$$ that satisfy $$$x \leq q_i$$$.

Note that even though the queries $$$q_i$$$ only request numbers up to $$$5 \cdot 10^6$$$, we may have to store even larger numbers due to the xor operation being able to "unset" large bits in $$$x$$$, bringing it back to less than $$$q_i$$$. The biggest possible bit we will be able to "unset" is $$$2^{d(499)} = 2^{22}$$$. This means that any value $$$x \geq 2^{23}$$$ is irrelevant because it will be impossible to "unset" the biggest bit of $$$x$$$, so these numbers will always remain above the queries of $$$5 \cdot 10^6$$$. Therefore we can set the upper bound on the values of $$$S_i$$$ at $$$MX = 2^{23} - 1$$$.

With this, we can transition to each set $$$S_i$$$ and answer queries $$$q_i$$$ in $$$O(MX)$$$, giving us a time complexity of $$$O(n \cdot MX)$$$, which is enough to pass the first two subtasks.

To optimize this, we can notice that the xor operation will only toggle one bit in a number: turning this bit on if it is currently off, and turning it off if it is currently on. This allows us to rewrite the transitions as

Now that we reduced the operations to addition and subtraction to numbers on a set, does this look familiar? Bitset in author's solution!!

Adding and subtracting to numbers is equivalent to shifting a bitset, and to only add/subtract to numbers with/without the bit $$$2^{d(p_i)}$$$, we can precompute two bitsets, one for the numbers that this bit on, and one for numbers with this bit off. Using the bitwise AND operation to keep only the numbers with/without the bit $$$2^{d(p_i)}$$$ on in the current bitset, we can then shift these extracted bitsets independently to do perform the "xor" operation. Finally, to count the number of elements in the set $$$x \leq q_i$$$, we can perform another bitwise AND operation with the first $$$q_i + 1$$$ bits on (include $$$0$$$), which can be obtained by shifting a fully set bitset right an appropriate amount of times.

This optimizes the transitions and queries to $$$O(\frac{n \cdot MX}{64})$$$ Precomputation of bitsets can be optimized, but a straightfoward implementation gives a time complexity of $$$O(MX \cdot log_2 MX)$$$, which is fast enough, giving a total time complexity of $$$O(\frac{n \cdot MX}{64} + MX \cdot log_2 MX)$$$. Alternatively, precomputation can be avoided altogether with manual bitsets, running in $$$O(\frac{n \cdot MX}{64})$$$.

**Code**

```
#include<bits/stdc++.h>
using namespace std;
using ll = long long;
#define all(x) (x).begin(), (x).end()
const int N = 1 << 23, L = 23;
bitset<N> on[L], off[L], all, b;
int main() {
cin.tie(0)->sync_with_stdio(false);
all.set();
for (int i = 0; i < L; i++) {
for (int j = 0; j < N; j++) {
if (j >> i & 1)
on[i].set(j);
else
off[i].set(j);
}
}
int n, x;
cin >> n >> x;
vector<int> p(n), q(n);
for (int &i : p)
cin >> i;
for (int &i : q)
cin >> i;
b[x] = 1;
for (int i = 0; i < n; i++) {
int d = 0;
for (int j = p[i]; j > 0; j /= 10)
d += j % 10;
b = (b << p[i]) | ((b & off[d]) << (1 << d)) | ((b & on[d]) >> (1 << d));
cout << (b & (all >> (N - q[i] - 1))).count() << '\n';
}
return 0;
}
```

# Advanced G: Ifrit Tile

**Solution**

Notice that we essentially need to check whether the path given in the query overlaps with any of the paths of slugs. If they do overlap, our query increases by the value of the slug path.

An easier way to represent the overlapping of $$$2$$$ paths is to consider the lowest common ancestor (LCA) of each path $$$l_1$$$ and $$$l_2$$$. If the paths overlap either $$$l_1$$$ will be on path $$$2$$$ or $$$l_2$$$ will be on path $$$1$$$.

By checking this for each path we can solve the problem in $$$O(n \log n + qm \log n)$$$, which is enough to get up to the second subtask.

We can find the Euler tour of the tree. Let $$$st[u]$$$ and $$$en[u]$$$ denote the starting and ending indexes of node $$$u$$$ in the Euler tour.

Let's consider both cases separately.

**Case 1: LCA of the query is on the slug path**

Let $$$l$$$ be the LCA of a slug path with endpoints at $$$s$$$ and $$$t$$$.

We can create a Fenwick tree that stores $$$-2 \cdot \text{val}$$$ in $$$st[l]$$$ for each slug path and $$$\text{val}$$$ in both $$$st[s]$$$ and $$$st[t]$$$.

Let $$$c$$$ be the LCA of the query with endpoints at $$$u$$$ and $$$v$$$.

When we want to find the sum of values for slug paths where the LCA is on the slug path, we can query the sum of the subtree of the LCA of the query denoted by $$$[st[c], en[c]]$$$.

If neither $$$c$$$ nor $$$l$$$ are ancestors of the other then slug contributions will be irrelevant.

If $$$c$$$ is above or equal to $$$l$$$ then the values in the LCA and path endpoints will cancel out and provide no contribution (the equal case is fine because we will count this in case 2).

If $$$c$$$ is below $$$l$$$ but does not overlap with the path then the slug contributions will be irrelevant.

If $$$c$$$ is below $$$l$$$ and does overlap with a slug path, the slug path will contribute value from either the value located in $$$st[s]$$$ or $$$st[t]$$$.

**Case 2: LCA of the slug path is on the query path**

Like before, let $$$l$$$ be the LCA of a slug path with endpoints at $$$s$$$ and $$$t$$$ and let $$$c$$$ be the LCA of the query with endpoints at $$$u$$$ and $$$v$$$.

We can create another Fenwick tree that stores $$$\text{val}$$$ in $$$st[l]$$$ and $$$-\text{val}$$$ in $$$en[l]$$$.

Then when making a query we can add the results from the three queries, $$$[st[c] + 1, st[v]]$$$, $$$[st[c] + 1, st[u]]$$$, and $$$[st[c], st[c]]$$$ to get the sum along the path.

Adding the values from both cases gives the answer for a query.

The time complexity now is $$$O(n \log n + m \log n + q \log n)$$$.

**Code**

```
//maomao and hanekawa will carry me to red
#pragma GCC optimize("O3,unroll-loops")
//#pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include <iostream>
#include <iomanip>
#include <cmath>
#include <utility>
#include <cassert>
#include <algorithm>
#include <vector>
#include <array>
#include <cstring>
#include <functional>
#include <numeric>
#include <set>
#include <queue>
#include <map>
#include <chrono>
#include <random>
#define sz(x) ((int)(x.size()))
#define all(x) x.begin(), x.end()
#define pb push_back
#define eb emplace_back
#define kill(x, s) {if(x){ cout << s << "\n"; return ; }}
#ifndef LOCAL
#define cerr while(0) cerr
#endif
using ll = long long;
using lb = long double;
const lb eps = 1e-9;
const ll mod = 1e9 + 7, ll_max = 1e18;
//const ll mod = (1 << (23)) * 119 +1, ll_max = 1e18;
const int MX = 5e5 +10, int_max = 0x3f3f3f3f;
struct {
template<class T>
operator T() {
T x; std::cin >> x; return x;
}
} in;
using namespace std;
struct hld{
vector<int> dep, sz, par, head, tin, out, tour;
vector<vector<int>> adj;
int n, ind;
hld(){}
hld(vector<pair<int, int>> edges, int rt = 1){
n = sz(edges) + 1;
ind = 0;
dep = sz = par = head = tin = out = tour = vector<int>(n+1, 0);
adj = vector<vector<int>>(n+1);
for(auto [a, b] : edges){
adj[a].push_back(b);
adj[b].push_back(a);
}
ind = 1;
dfs(rt, 0);
head[rt] = rt;
dfs2(rt, 0);
}
void dfs(int x, int p){
sz[x] = 1;
dep[x] = dep[p] + 1;
par[x] = p;
for(auto &i : adj[x]){
if(i == p) continue;
dfs(i, x);
sz[x] += sz[i];
if(adj[x][0] == p || sz[i] > sz[adj[x][0]]) swap(adj[x][0], i);
}
}
void dfs2(int x, int p){
tour[ind] = x;
tin[x] = ind++;
for(auto &i : adj[x]){
if(i == p) continue;
head[i] = (i == adj[x][0] ? head[x] : i);
dfs2(i, x);
}
out[x] = ind;
}
int k_up(int u, int k){
if(dep[u] <= k) return -1;
while(k > dep[u] - dep[head[u]]){
k -= dep[u] - dep[head[u]] + 1;
u = par[head[u]];
}
return tour[tin[u] - k];
}
int lca(int a, int b){
while(head[a] != head[b]){
if(dep[head[a]] > dep[head[b]]) swap(a, b);
b = par[head[b]];
}
if(dep[a] > dep[b]) swap(a, b);
return a;
}
} tr;
ll BIT[2][MX];
int n, m, q;
array<int, 3> paths[MX];
int occ[MX];
void update(int op, int k, ll v){
for(; k<=n; k+=k&-k) BIT[op][k] += v;
}
ll sum(int op, int k){
return (k == 0) ? (0) : (BIT[op][k] + sum(op, k-(k&-k)));
}
//to consider if slug (a, b) affects query (u, v)
//let l be the of (a, b) and p be the lca of (u, v)
//if dep[l] < dep[p] then p is in the path from a to b iff they intersect
//if dep[p] < dep[l] then l is in the path from u to v iff they intersect
//so when adding a new path (or removing it) we will
//add 1 to all the nodes from a to b on ds1
//add 1 lca(a, b) on ds2
//whenever querying we will
//query the value of the node p on ds1
//sum u to v on ds2
//point add range sum
//range add point sum
//no multitest since multitest is cringe
void add(int j){
occ[j] *= -1;
auto [a, b, c] = paths[j];
c *= occ[j];
int l = tr.lca(a, b);
update(0, tr.tin[a], c);
update(0, tr.tin[b], c);
update(0, tr.tin[l], -2ll*c);
update(1, tr.tin[l], c);
update(1, tr.out[l], -c);
}
void solve(){
n = in, m = in, q = in;
vector<pair<int, int>> edges;
for(int i = 1; i<n; i++){
int a = in, b = in;
edges.pb({a, b});
}
tr = hld(edges);
for(int i = 1; i<=m; i++){
int a = in, b = in, c = in;
int d = in;
paths[i] = {a, b, c};
occ[i] = -1;
if(d) add(i);
}
for(int i = 1; i<=q; i++){
int op = in;
cerr << "query " << i << "\n";
if(op == 1 || op == 2){
int j = in;
add(j);
}else{
int u = in, v = in;
int p = tr.lca(u, v);
ll ans1 = sum(0, tr.out[p]-1) — sum(0, tr.tin[p]-1);
ll ans2 = sum(1, tr.tin[u]) + sum(1, tr.tin[v]) — sum(1, tr.tin[p]) — sum(1, tr.tin[tr.par[p]]);
ll ans = ans1 + ans2;
cerr << ans1 << " " << ans2 << "\n";
cout << ans << "\n";
}
}
}
signed main(){
cin.tie(0) -> sync_with_stdio(0);
int T = 1;
//cin >> T;
for(int i = 1; i<=T; i++){
//cout << "Case #" << i << ": ";
solve();
}
return 0;
}
```

# Advanced H: Thomas Sometimes Hides His Feelings in C++

**Hint 1**

Renai circulation.

**Hint 2**

Taken as a whole, $$$p_i$$$ (as defined in the statement) is always a permutation. Any permutation can be decomposed into cycles. In our case, the cycle will cycle between male and female tropes.

**Hint 3**

We can maintain which tropes we have taken as a bitmask for a given cycle.

**Hint 4**

We can use the lowest bit in the bitmask to maintain the first node in the cycle, decreasing the number of states.

**Hint 5**

How do we merge cycles fast?

**Solution**

For simplicity, let's consider the input as a graph of $$$n$$$ nodes and $$$a_{ij}$$$ denotes the edge from node $$$i$$$ to node $$$j$$$.

For a singular cycle, we can run a bitmask dp where our state is $$$\text{dp}[\text{mask}][u][0/1]$$$ where $$$\text{mask}$$$ stores a bitmask of the nodes our cycle has visited, $$$u$$$ is the current node in our cycle (and the first node in the cycle is the lowest bit in $$$\text{mask}$$$, and $$$0/1$$$ stores whether or not we started our cycle with a male or a female. Our transitions are to either extend our cycle from node $$$u$$$ to any node $$$v$$$ with index $$$ > \text{lowbit}(\text{mask})$$$ or to close the cycle by transitioning back to node $$$\text{lowbit}(\text{mask})$$$.

We can naively merge cycles in $$$3^n$$$, which should pass for $$$n \leq 16$$$.

For the full solution, we can change our DP state to $$$\text{dp}[\text{mask}][u][0/1]$$$ where $$$\text{mask}$$$ stores a bitmask of the nodes every cycle so far has visited, $$$u$$$ is the current node in our current cycle (and the first node in the **current** cycle is the lowest bit in $$$\text{mask}$$$, and $$$0/1$$$ stores whether or not we started our cycle with a male or a female.

Our transitions are now to extend our current cycle to some node with index $$$ > \text{lowbit}(\text{mask})$$$, close our cycle, or in the case when the cycle is closed, start a new cycle at some node with index $$$ < \text{lowbit}(\text{mask})$$$.

This is $$$O(n \cdot 2^n)$$$ states and $$$O(n^2 2^n)$$$ time complexity. It is also possible to achieve this complexity using subset convolution, but the high constant may be difficult to pass. Precomputing the inverses for $$$a_{ij}$$$ is required to make all transitions $$$O(1)$$$.

**Code**

```
#include <bits/stdc++.h>
using namespace std;
using i64 = long long;
using ar2 = array<int, 2>;
const int MOD = 998244353;
int mul(int a, int b) {
return (i64) a * b % MOD;
}
int inv(int a) {
return a == 1 ? 1 : mul(inv(MOD % a), MOD - MOD / a);
}
int addself(int &a, int b) {
if ((a += b) >= MOD) {
a -= MOD;
}
return a;
}
int solve(int n, vector<vector<int>> a) {
vector v(n, vector<ar2>(n));
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
if (a[i][j] == -1) {
v[i][j] = {0, 0};
} else {
v[i][j] = {a[i][j], inv(a[i][j])};
}
}
}
vector dp(1 << n, vector<ar2>(n));
vector<int> f(1 << n);
f[0] = 1;
for (int b = 0; b < 1 << n; b++) {
int p = 0, c = __builtin_popcount(b) & 1;
while (p < n && !(b >> p & 1)) {
p++;
}
for (int i = 0; i < n; i++) {
if ((b >> i & 1) == 0) {
continue;
}
for (int t = 0; t < 2; t++) {
const int &z = dp[b][i][t];
if (!z) {
continue;
}
if (!c) {
addself(f[b], mul(z, v[i][p][t]));
}
for (int j = p + 1; j < n; j++) {
if (b >> j & 1) {
continue;
}
addself(dp[b | (1 << j)][j][t ^ 1], mul(z, v[i][j][t]));
}
}
}
if (f[b]) {
for (int i = 0; i < p; i++) {
addself(dp[b | (1 << i)][i][0], f[b]);
addself(dp[b | (1 << i)][i][1], f[b]);
}
}
}
return f[(1 << n) - 1];
}
int main() {
ios::sync_with_stdio(0);
cin.tie(0);
int t;
cin >> t;
while (t--) {
int n;
cin >> n;
vector a(n, vector<int>(n));
for (auto &v : a) {
for (auto &x : v) {
cin >> x;
}
}
cout << solve(n, a) << '\n';
}
}
```

# Advanced I: Flappy Deer

**Hint 1**

The problem reduces to simulating $$$\text{dp}[i][j] = \min(\text{dp}[i + 1][j - 1], \text{dp}[i + 1][j], \text{dp}[i + 1][j + 1])$$$

**Hint 2**

Consider segments of $$$\text{dp}[i][j]$$$ has the same value and same $$$i$$$. What happens after one $$$\text{dp}[i][j] = \min(\text{dp}[i + 1][j - 1], \text{dp}[i + 1][j], \text{dp}[i + 1][j + 1])$$$ transition?

**Hint 3**

If a segment is greater than an adjacent segment, it will extend into it.

**Hint 4**

Whats the best way to store segments?

**Solution**

Read the hints. We run the problem backwards, with $$$\text{dp}[i][j]$$$ being the maximum number crackers reachable by starting at $$$(i, j)$$$. Then, we need to support $$$\text{dp}[i][j] = \min(\text{dp}[i + 1][j - 1], \text{dp}[i + 1][j], \text{dp}[i + 1][j + 1])$$$ transitions, setting a range of $$$\text{dp}[i][j]$$$ to $$$0$$$ (encountering a water bottle), and increasing $$$\text{dp}[i][j]$$$ (encountering a cracker). Lets consider the slopes of the dp, ie. $$$\text{dp}[i][j] - \text{dp}[i][j - 1]$$$. We can see that after each transition, any positive slope at $$$j$$$ moves to $$$j - 1$$$. Any negative slope at $$$j$$$ moves to $$$j + 1$$$. This is because larger segments "eat" into smaller segments. Then, the problem just becomes maintaining these slopes. This can be done using treaps, which each slope's $$$j$$$ value as the key. To simulate more than one transition at once, we can use a lazy tag and check for any positive/negative slopes that overlap and deal with it.

**Code**

```
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
using namespace std;
#define pb push_back
#define ff first
#define ss second
typedef long long ll;
typedef long double ld;
typedef pair<int, int> pii;
typedef pair<ll, ll> pll;
typedef pair<ld, ld> pld;
const int INF = 1e9;
const ll LLINF = 1e18;
const int MOD = 1e9 + 7;
inline ll ceil0(ll a, ll b) {
return a / b + ((a ^ b) > 0 && a % b);
}
void setIO() {
ios_base::sync_with_stdio(0); cin.tie(0);
}
struct node {
int pos, tag, mnpos, mxpos;
pair<int, pii> mndif;
int sum, val;
int weight;
node(){}
node(int pos_, int val_){
pos = pos_;
val = val_;
sum = val;
mxpos = -INF;
mnpos = INF;
mndif = {INF, {INF, INF}};
if(val < 0) mxpos = pos;
if(val > 0) mnpos = pos;
tag = 0;
weight = rand();
}
};
int sz = 1;
node treap[5000005];
int left0[5000005];
int right0[5000005];
int newnode(int pos, int val){
treap[sz] = node(pos, val);
return sz++;
}
pair<int, pii> comb(int a, int b){
if(a == -INF || b == INF) return {INF, {INF, INF}};
return {b — a, {a, b}};
}
void pull(int x){
treap[x].sum = treap[x].val;
treap[x].mxpos = -INF;
treap[x].mnpos = INF;
treap[x].mndif = {INF, {INF, INF}};
if(treap[x].val < 0) treap[x].mxpos = treap[x].pos;
if(treap[x].val > 0) treap[x].mnpos = treap[x].pos;
if(left0[x]){
treap[x].mndif = min(treap[x].mndif, treap[left0[x]].mndif);
treap[x].mndif = min(treap[x].mndif, comb(treap[left0[x]].mxpos, treap[x].mnpos));
treap[x].mnpos = min(treap[x].mnpos, treap[left0[x]].mnpos);
treap[x].mxpos = max(treap[x].mxpos, treap[left0[x]].mxpos);
treap[x].sum += treap[left0[x]].sum;
}
if(right0[x]){
treap[x].mndif = min(treap[x].mndif, treap[right0[x]].mndif);
treap[x].mndif = min(treap[x].mndif, comb(treap[x].mxpos, treap[right0[x]].mnpos));
treap[x].mnpos = min(treap[x].mnpos, treap[right0[x]].mnpos);
treap[x].mxpos = max(treap[x].mxpos, treap[right0[x]].mxpos);
treap[x].sum += treap[right0[x]].sum;
}
}
int move(node& x, int shift){
int ret = x.pos;
if(x.val > 0) ret -= shift;
if(x.val < 0) ret += shift;
return ret;
}
void push(int x){
if(!treap[x].tag) return;
if(left0[x]){
treap[left0[x]].pos = move(treap[left0[x]], treap[x].tag);
treap[left0[x]].tag += treap[x].tag;
if(treap[left0[x]].mnpos != INF) treap[left0[x]].mnpos -= treap[x].tag;
if(treap[left0[x]].mxpos != -INF) treap[left0[x]].mxpos += treap[x].tag;
if(treap[left0[x]].mndif.ff != INF){
treap[left0[x]].mndif.ff -= 2*treap[x].tag;
treap[left0[x]].mndif.ss.ff += treap[x].tag;
treap[left0[x]].mndif.ss.ss -= treap[x].tag;
}
}
if(right0[x]){
treap[right0[x]].pos = move(treap[right0[x]], treap[x].tag);
treap[right0[x]].tag += treap[x].tag;
if(treap[right0[x]].mnpos != INF) treap[right0[x]].mnpos -= treap[x].tag;
if(treap[right0[x]].mxpos != -INF) treap[right0[x]].mxpos += treap[x].tag;
if(treap[right0[x]].mndif.ff != INF){
treap[right0[x]].mndif.ff -= 2*treap[x].tag;
treap[right0[x]].mndif.ss.ff += treap[x].tag;
treap[right0[x]].mndif.ss.ss -= treap[x].tag;
}
}
treap[x].tag = 0;
}
int merge(int a, int b){
if(!a) return b;
if(!b) return a;
if(treap[a].weight < treap[b].weight){
push(a);
right0[a] = merge(right0[a], b);
pull(a);
return a;
} else {
push(b);
left0[b] = merge(a, left0[b]);
pull(b);
return b;
}
}
//splits rt's tree into [0, k) [k, INF)
pair<int, int> split(int x, int k){
if(!x) return pair<int, int>{0, 0};
push(x);
pair<int, int> ret;
if(treap[x].pos < k){
ret = split(right0[x], k);
right0[x] = ret.first;
ret.first = x;
} else {
ret = split(left0[x], k);
left0[x] = ret.second;
ret.second = x;
}
pull(x);
return ret;
}
int rt = 0;
void erase(int x){
pair<int, int> a = split(rt, x);
pair<int, int> b = split(a.second, x + 1);
rt = merge(a.first, b.second);
}
//position, value
void insert(int a, int b){
if(!rt){
rt = newnode(a, b);
return;
}
pair<int, int> nw = split(rt, a);
rt = merge(nw.first, merge(newnode(a, b), nw.second));
}
//value
int query(int x){
pair<int, int> a = split(rt, x);
pair<int, int> b = split(a.second, x + 1);
int ret = (b.first ? treap[b.first].val : 0);
rt = merge(a.first, merge(b.first, b.second));
return ret;
}
//sum
int pre(int x){
pair<int, int> a = split(rt, x + 1);
int ret = (a.first ? treap[a.first].sum : 0);
rt = merge(a.first, a.second);
return ret;
}
void ins(int ind, int v){
int cur = query(ind);
if(cur != 0) erase(ind);
if(cur + v != 0) insert(ind, cur + v);
}
int main(){
setIO();
int n, m, q;
cin >> n >> m >> q;
map<int, vector<pair<int, pii>>, greater<int>> op;
for(int i = 0; i < n; i++){
int a, b, c;
cin >> a >> b >> c;
op[a].pb({0, {b, c}});
}
for(int i = 0; i < m; i++){
int a, b;
cin >> a >> b;
op[a].pb({-1, {b, b}});
}
for(int i = 0; i < q; i++){
int a, b;
cin >> a >> b;
op[a].pb({1, {b, i}});
}
ll ans[q];
int prv = INF;
for(auto i : op){
int dif = prv — i.ff;
//remove overlap
while(rt && treap[rt].mndif.ff <= 2*dif){
pair<int, pii> x = treap[rt].mndif;
ll a = query(x.ss.ff), b = query(x.ss.ss);
ll sub = min(-a, b);
erase(x.ss.ff);
erase(x.ss.ss);
a += sub, b -= sub;
if(a != 0) insert(x.ss.ff, a);
if(b != 0) insert(x.ss.ss, b);
}
//shift everything
if(rt){
treap[rt].pos = move(treap[rt], dif);
treap[rt].tag += dif;
if(treap[rt].mnpos != INF) treap[rt].mnpos -= dif;
if(treap[rt].mxpos != -INF) treap[rt].mxpos += dif;
if(treap[rt].mndif.ff != INF){
treap[rt].mndif.ff -= 2*dif;
treap[rt].mndif.ss.ff += dif;
treap[rt].mndif.ss.ss -= dif;
}
}
sort(i.ss.begin(), i.ss.end());
for(auto j : i.ss){
if(j.ff == -1){
ins(j.ss.ff, 1);
ins(j.ss.ff + 1, -1);
} else if(j.ff == 0){
int l = pre(j.ss.ff — 1);
int r = pre(j.ss.ss + 1);
pii a = split(rt, j.ss.ff);
pii b = split(a.ss, j.ss.ss + 2);
rt = merge(a.ff, b.ss);
ins(j.ss.ff, -l);
ins(j.ss.ss + 1, r);
} else {
ans[j.ss.ss] = pre(j.ss.ff);
}
}
prv = i.ff;
}
for(int i = 0; i < q; i++){
cout << ans[i] << "\n";
}
}
```

# Advanced J: Grid Product

**Hint 1**

Dealing with the products of overlapping rows and columns is hard, so reformulate in more clever and combinatorial way.

**Hint 2**

Use Product Trick

**Hint 3**

Consider the small bounds. What kind of complexity can this support?

**Hint 4**

Bitmask DP?

**Solution**

With Product Trick, $$$F(A)$$$ can be reinterpreted in the following way: Let each cell have $$$A_{i, j}$$$ uncolored balls, what is the number of ways to color one ball in each row red and in each column blue (a ball can be red and blue)?

We first notice that $$$NM \leq 300$$$, which means that $$$\text{min}(N, M)$$$ is bounded by $$$\lfloor \sqrt{NM} \rfloor \leq 17$$$. This motivates using bitmask DP. Rotate the grid $$$90$$$ degrees if $$$N < M$$$. Let's process the grid row by row, column by column. Let $$$\text{mask}$$$ be a bitmask of length $$$M$$$, with each bit at index $$$i$$$ denoting if the column $$$i$$$ already has a blue ball.

We will use the technique of DP on Broken Profile. Let's define $$$\text{dp}[i][j][k][\text{mask}]$$$, which equals the number of ways to color the balls over all valid $$$A_{i, j}$$$ from $$$(1, 1)$$$ to $$$(i, j)$$$ that satisfies the following conditions:

- All columns with bits on in $$$\text{mask}$$$ have a blue ball.
- All rows from $$$1$$$ to $$$i - 1$$$ have a red ball.
- Row $$$i$$$ has a red ball if $$$k = 1$$$.
- All other rows have no red balls and all other columns have no blue balls.

When transitioning from $$$(i, j - 1)$$$ to $$$(i, j)$$$, we choose how to color the balls in cell $$$(i, j)$$$. If $$$k = 0$$$, we can color a ball in this cell red and make $$$k = 1$$$. If $$$\text{mask}$$$ has bit $$$j$$$ off, we can color a ball in this cell blue and turn the bit on.

Let $$$\displaystyle{g(x, y) = \sum_{i = 0}^{x} i^y}$$$, which is the number of ways to color $$$y$$$ balls (balls can be colored multiple times) given $$$i$$$ uncolored balls for all $$$0 \leq i \leq x$$$. This can be computed in $$$O(1)$$$ using sum of $$$k$$$-th power formulas. Using this, the transitions are:

$$$\displaystyle{\text{dp}[i][j][k][\text{mask}] = \sum{\text{dp}[i][j - 1][l][\text{pmask}] \cdot g(B_{i, j}, k - l + [\text{mask} \neq \text{pmask}])}}$$$

Transitioning between rows just requires taking $$$k = 1$$$ states from $$$(i - 1, M)$$$ to $$$k = 0$$$ states on row $$$i$$$.

To avoid a potential MLE, notice that transitions only depend on the previous cell, so the states $$$i$$$ and $$$j$$$ can be cut.

Time Complexity: $$$O\left(NM \cdot 2^{\text{min}(N, M)}\right)$$$

**Code**

```
#include "bits/stdc++.h"
using namespace std;
typedef long long ll;
constexpr int mod = 998244353;
const int N = 305;
const int M = 17;
const int d2 = (mod + 1) / 2, d6 = (mod + 1) / 6;
int n, m, k, qq;
int a[N][N], f[1 << M], g[1 << M];
void solve() {
cin >> n >> m;
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int v;
cin >> v;
if (n > m) a[i][j] = v;
else a[j][i] = v;
}
}
if (n <= m) swap(n, m);
f[0] = 1, k = 1 << m;
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
ll v = a[i][j];
int m0 = (v + 1) % mod, m1 = v * (v + 1) % mod * d2 % mod;
int m2 = v * (v + 1) % mod * (2 * v + 1) % mod * d6 % mod;
for (int mask = k - 1; mask >= 0; mask--) {
int x = f[mask], y = g[mask];
ll ff = (ll)x * m0, gg = (ll)x * m1 + (ll)y * m0;
if ((mask >> j) & 1) {
int xx = f[mask ^ (1 << j)], yy = g[mask ^ (1 << j)];
ff += (ll)xx * m1;
gg += (ll)xx * m2 + (ll)yy * m1;
}
f[mask] = ff % mod, g[mask] = gg % mod;
}
}
for (int mask = 0; mask < k; mask++) {
f[mask] = g[mask], g[mask] = 0;
}
}
cout << f[k - 1] << "\n";
}
int32_t main() {
std::ios::sync_with_stdio(false);
cin.tie(NULL);
solve();
}
```

# Advanced K: The Astral Express

**Hint 1**

Consider a function mapping the velocity going into the left/right side to the velocity going out.

**Hint 2**

Notice that this function will either increase your velocity by $$$1$$$ or $$$2$$$.

**Hint 3**

Draw the bipartite graph between velocities on the left and velocities on the right.

**Solution**

Looking at the bipartite graph, we can observe there are only two possible chains of velocities the Astral Express can take. The problem then comes down to maintaining these two chains between updates. Notice that the merge and split operations only affect certain regions of the chain. Thus, we can update those manually and recompute those regions. To maintain the chains as well as suffix queries on the chains, we can use a treap.

**Code**

```
#include <bits/stdc++.h>
using namespace std;
#define ff first
#define ss second
typedef long long ll;
typedef pair<int, int> pii;
void setIO() {
ios_base::sync_with_stdio(0); cin.tie(0);
}
struct node {
int l, r, type, weight, par;
array<int, 2> sz;
array<ll, 2> tag;
ll val, sum;
node(){}
node(int val_, int type_){
weight = rand();
if(val_ >= 0){
type = type_;
val = sum = val_;
sz[type] = 1;
tag[0] = tag[1] = sz[!type] = l = r = par = 0;
} else {
type = -1;
tag[0] = tag[1] = sz[0] = sz[1] = 0;
l = r = par = 0;
}
}
} treap[400005];
inline int lc(int ind){ return treap[ind].l; }
inline int rc(int ind){ return treap[ind].r; }
inline int sz(int ind){ return treap[ind].sz[0] + treap[ind].sz[1]; }
void pull(int ind){
treap[ind].sz[0] = !treap[ind].type + treap[lc(ind)].sz[0] + treap[rc(ind)].sz[0];
treap[ind].sz[1] = treap[ind].type + treap[lc(ind)].sz[1] + treap[rc(ind)].sz[1];
treap[ind].sum = treap[ind].val + treap[lc(ind)].sum + treap[rc(ind)].sum;
}
void push(int ind){
if(lc(ind)){
treap[lc(ind)].sum += treap[ind].tag[0]*treap[lc(ind)].sz[0] + treap[ind].tag[1]*treap[lc(ind)].sz[1];
treap[lc(ind)].val += treap[ind].tag[treap[lc(ind)].type];
treap[lc(ind)].tag[0] += treap[ind].tag[0];
treap[lc(ind)].tag[1] += treap[ind].tag[1];
}
if(rc(ind)){
treap[rc(ind)].sum += treap[ind].tag[0]*treap[rc(ind)].sz[0] + treap[ind].tag[1]*treap[rc(ind)].sz[1];
treap[rc(ind)].val += treap[ind].tag[treap[rc(ind)].type];
treap[rc(ind)].tag[0] += treap[ind].tag[0];
treap[rc(ind)].tag[1] += treap[ind].tag[1];
}
treap[ind].tag[0] = treap[ind].tag[1] = 0;
}
pii split(int rt, int k){
if(!rt) return {0, 0};
push(rt);
pii ret;
if(sz(lc(rt)) < k){
ret = split(rc(rt), k — sz(lc(rt)) — 1);
treap[rt].r = ret.ff;
treap[ret.ff].par = rt;
ret.ff = rt;
treap[ret.ss].par = 0;
} else {
ret = split(lc(rt), k);
treap[rt].l = ret.ss;
treap[ret.ss].par = rt;
ret.ss = rt;
treap[ret.ff].par = 0;
}
pull(rt);
return ret;
}
int merge(int a, int b){
if(!a || !b) return a ^ b;
if(treap[a].weight < treap[b].weight){
push(a);
int x = merge(rc(a), b);
treap[x].par = a;
treap[a].r = x;
pull(a);
return a;
} else {
push(b);
int x = merge(a, lc(b));
treap[x].par = b;
treap[b].l = x;
pull(b);
return b;
}
}
int get_rt(int x){
while(treap[x].par) x = treap[x].par;
return x;
}
// updates everything after x, inclusive
void update(int &rt, int x, int v, int t){
pii a = split(rt, x);
if(a.ss){
//cout << "e " << treap[a.ss].sz[t] << endl;
if(t == treap[a.ss].type) treap[a.ss].val += v;
treap[a.ss].sum += v*treap[a.ss].sz[t];
treap[a.ss].tag[t] += v;
}
rt = merge(a.ff, a.ss);
}
// queries everything after x, inclusive
ll query(int rt, int x){
pii a = split(rt, x);
ll ret = treap[a.ss].sum;
merge(a.ff, a.ss);
return ret;
}
pii info(int x){
int ind = sz(lc(x));
while(treap[x].par){
int nxt = treap[x].par;
if(treap[nxt].r == x) ind += sz(lc(nxt)) + 1;
x = nxt;
}
return {x, ind};
}
ll get_val(int x, int t){
ll ret = treap[x].val;
x = treap[x].par;
while(x){
ret += treap[x].tag[t];
x = treap[x].par;
}
return ret;
}
int n;
//merges ind with ind + 1
void type2(int ind){
pii a = info(ind);
pii b = info(ind + 1);
int t = (ind <= n ? 0 : 1);
if(a.ff == b.ff){ // mirrors a 2
update(b.ff, b.ss, -1, t);
if(ind + 2 — t*n <= n){ // update the other chain
pii c = info(ind + 2);
update(c.ff, c.ss, -1, t);
}
pii aa = split(a.ff, a.ss + 1);
pii bb = split(aa.ss, 2);
merge(aa.ff, bb.ss);
split(bb.ff, 1);
} else { // mirrors two 1s
update(a.ff, a.ss + 1, -1, t);
update(b.ff, b.ss, -1, t);
pii aa = split(a.ff, a.ss + 1);
pii bb = split(b.ff, b.ss + 1);
merge(aa.ff, bb.ss);
merge(bb.ff, aa.ss);
}
}
//splits ind and ind + 1
void type3(int ind){
int t = (ind <= n ? 0 : 1);
if(!treap[ind + 1].par && !treap[ind + 1].l && !treap[ind + 1].r){ // mirrors a 2
treap[ind + 1].sum = treap[ind + 1].val = get_val(ind, t) + 1;
if(ind + 2 — t*n <= n){ // update the other chain
pii c = info(ind + 2);
update(c.ff, c.ss, 1, t);
}
pii x = info(ind);
update(x.ff, x.ss + 1, 1, t); // update this chain
pii a = split(x.ff, x.ss + 1);
if(t){
treap[ind — n + 1].sum = treap[ind — n + 1].val = get_val(ind — n, !t);
merge(merge(merge(a.ff, ind — n + 1), ind + 1), a.ss);
} else {
treap[ind + n + 1].sum = treap[ind + n + 1].val = get_val(ind + n, !t);
merge(merge(merge(a.ff, n + ind + 1), ind + 1), a.ss);
}
} else { // mirrors two 1s
pii a = info(ind);
pii b = info(ind + 1);
update(a.ff, a.ss + 1, 1, t);
update(b.ff, b.ss, 1, t);
pii aa = split(a.ff, a.ss + 1);
pii bb = split(b.ff, b.ss + 1);
merge(aa.ff, bb.ss);
merge(bb.ff, aa.ss);
}
}
int pos, neg;
ll type1(int v){
if(v >= n) return neg + 1;
v++;
if(!treap[n + v].par && !treap[n + v].l && !treap[n + v].r) return -1; // infinite cycle
pii x = info(n + v);
bool fin = x.ss%2 == (sz(x.ff) + 1)%2;
ll ret = 2*query(x.ff, x.ss) + !fin;
if(fin) ret += pos;
else ret += neg;
return ret;
}
int seg[1600005];
void flip(int x, int l = 1, int r = 2*n, int cur = 0){
if(l == r){
seg[cur] ^= 1;
return;
}
int mid = (l + r)/2;
if(x <= mid) flip(x, l, mid, cur*2 + 1);
else flip(x, mid + 1, r, cur*2 + 2);
seg[cur] = seg[cur*2 + 1] + seg[cur*2 + 2];
}
int walk(int v, int l = 1, int r = 2*n, int cur = 0){
if(l == r) return l;
int mid = (l + r)/2;
if(seg[cur*2 + 1] >= v) return walk(v, l, mid, cur*2 + 1);
return walk(v — seg[cur*2 + 1], mid + 1, r, cur*2 + 2);
}
int main(){
setIO();
int q;
cin >> n >> q;
pos = neg = n;
for(int i = 1; i <= n; i++){
treap[i] = node(i — 1, 0);
treap[n + i] = node(i, 1);
}
treap[0] = node(-1, 0);
for(int i = 1; i < n; i++){
merge(get_rt(i), get_rt(n + i + 1));
merge(get_rt(n + i), get_rt(i + 1));
}
for(int i = 1; i <= 2*n; i++) flip(i);
for(int i = 0; i < q; i++){
int a, b;
cin >> a >> b;
if(a == 1){
cout << type1(b) << endl;
} else if(a == 2){
int ind = walk(b);
if(ind > n) type2(ind), neg--;
else type2(n — ind), pos--;
flip(ind + 1);
} else {
int ind = walk(b);
if(ind > n) type3(ind), neg++;
else type3(n — ind), pos++;
flip(ind + 1);
}
}
return 0;
}
```

**Alternative Solution**

Notice that every time the Astral Express crosses the center, its velocity increases by either $$$1$$$ or $$$2$$$. We can model each transition across the center as a matrix transition with states being $$$(\text{velocity}, \text{distance travelled between the last transition and current transition}, \text{total distance travelled})$$$. Using some casework on the velocity changes, these transitions can be maintained using a matrix segment tree or square root decomposition.

**Problem History**

This problem was originally set as H from CodeTON Round 8 (Div. 1 + Div. 2, Rated, Prizes!), before it was replaced by a different one. Shoutout to Benq, rainboy, and Geothermal for solving the problem in testing and discovering the alternate solution (no testers solved with treaps).

Problems great, W contest, wish I had 2 more minutes of time :/

bruh two treap problems

Actually, Advanced K has a non treap solution which I have just added.

Is there solution code for the non-treap solution?

Here.

thanks lunchbox orz

I hate newbies

Problem history for Advanced K has 2 LGM and 1 expert, but that expert has the name of rainboy

I am finding it difficult to understand the solution of F as I am not too comfortable with Iterative DP, so, I tried running bruteforce with my code and the given solution code. Then I stumbled across this two testcases.

TestCase 1:Trouble that I am facing:The given solution code gives answer as 1. But I couldn't understand why the solution should be 1,as we need to toggle two times to reach to the bottom right cell by applying operation on Row 1 and Row 3. So, I came up with the conclusion that, Toggling of the first Top-left cell doesn't count as the problem statement said, U r starting from the top-left cell. But then I am stumbled across testcase 2.

TestCase 2:New Trouble that I faced:The given solution code gives answer as -1. But the conclusion that I came up from the first testcase that, Toggling of the first Top-left cell doesn't count makes the answer of this testcase as 3 by applying operation on each of the three rows.

I would love to hear from anyone who could help me to solve my problem. Thanks for taking the time to read.

Your conclusion that the toggling of the top-left cell doesn't matter is wrong. The actual reason why the answer to your first test case is 1 is that the test cases are never supposed to have a 0 in the top-left cell, so the given solution code gives a weird answer.

Thanks for replying and now I got that.

xug = HEHEHEHAW

Explanation for 105283D - Parallel Arrays is not fully precise — the claim that the minimum unused value among a[1]..a[i-1], b[1]..b[i-1] is equal to min(a[i-1], b[i-1]) + 1 is false which is confirmed by the sample. If that would be true then the minimum among 4th elements would be 4 (which is min(3, 6) + 1) and not 7 (which is actually the minimum unused element).

Thanks for noticing that, the editorial has been updated.

can anyone explain ADVANCE E problem in detail with examples?

Like you, I find it really hard to understand E's editorials, which is why I want to try writing one myself.

Firstly some basic things: You must always type "orz" first before copying anything.

After all pastes, no copies are required. Moreover, two or more consecutive copies are not necessary.

We can see pastes like,

`typed orz for now`

$$$\times$$$`pasted times + 1`

For example, if you write 23 times, copy, and paste 3 times, you will end up with 92 times orz.Also, we can say something like

`typed orz for now`

$$$\times p_1 \times p_2 \times \dots \times p_m$$$, since we know pastes in between consecutive copies and do not require typing. Since we know there are never two or more consecutive copies, we can say that at most $$$\log_2 N$$$ copies are required each time becausewe multiply it by at least 2.Furthermore, $$$p_1, p_2, p_3, \dots, p_m$$$ must be

as similar as possible(max of them — min of them smaller or equal to 1)Solution:

Let $$$x$$$ be

`typed orz for now`

and let $$$y$$$ be $$$p_1 \times p_2 \times \dots \times p_m$$$. Assuming $$$x$$$ is at least 1, we find that $$$x \times y \geqslant N$$$.It is obvious that $$$x$$$ must equal $$$N$$$ if copying time is 0. Thus, $$$A \times N$$$ is the answer.

We only need to look at $$$2 \sqrt N$$$ values if we only used copying once, which means we multiply only one value. Because, $$$x$$$ or $$$y$$$ must be smaller than $$$\sqrt N$$$.

Else we know used copying at most 30 times; if we fixed that value, we can brute force to pasting times (because in worst case,

`copying time`

is $$$2$$$ and $$$N$$$ is $$$10^9$$$, we only need smaller $$$2 * 4 * 10^4$$$ because we typed orz 1 time, copied and pasted $$$4 * 10^4$$$ times then copy again, and paste $$$4 * 10^4$$$ times so we typed $$$16 * 10^8$$$). So we can do that like $$$\mathcal{O}(30 * 30 * 10^5)$$$. It comes from we need to try copying 30 value, and pasting value $$$10^5$$$, and distribute 30 times too for $$$p$$$ array. (Also seems, editorial code uses a little bit different pattern so they don't need to distribute.)I tried to clarify it as much as I can, if you don't understand something you can ask me.

", we only need smaller 2∗4∗104 because we typed orz 1 time, copied and pasted 4∗104 times then copy again, and paste 4∗104 times so we typed 16∗108 " in this why you typed orz 1 time, also how are you going to calculate the total cost, in terms of A,B,C

From $$$2$$$ to $$$31$$$, we are applying

`copying time`

by brute force. From $$$1$$$ to $$$8*10^4$$$, we are brute forcing`pasting time`

. (The loops are`for`

loops.)We could compute $$$y$$$ in the manner of $$$\mathcal{O}(30)$$$ if we fixed

`copying time`

to $$$a$$$ and`pasting time`

to $$$b$$$. At least $$$b/a$$$ is received by all, and $$$b \pmod a$$$ receives an additional $$$1$$$. Then just multiply them by`pasted times + 1`

, which is their multiplying factor, to get $$$c$$$.At least $$$\lceil N/c \rceil$$$

`typed orz`

is required.Update the response after that.

For example:

Let $$$A=1$$$, $$$B=2$$$, $$$C=3$$$, and $$$N=2300$$$.

Fixing $$$a=3$$$ and $$$b=7$$$ will result in $$$c=(3+1)*(2+1)*(2+1)=36$$$. The required

`typed orz`

is equal to $$$\lceil N/c \rceil=\lceil 2300/36 \rceil = 64$$$.So update answer: $$$A*64 + B*3 + C*7=91$$$.

We can show this process like:

Typed {orz} $$$64$$$ times.

After that, copy and paste three times. So we have $$$64*4=256$$$.

I copied twice and pasted twice. So we have $$$256*3=768$$$.

I copied once more and pasted twice. So we have $$$768*3=2304$$$.

Additionally, since it exceeds $$$2300$$$, the order is correct.

Can someone explain problem E