[Tutorial] On Applications of CDQ Divide & Conquer

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

№	Пользователь	Вклад
1	Qingyu	157
2	adamant	153
3	Um_nik	147
4	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

This tutorial is nominated for the blog posts competition from cadmiumky. Thanks to him for this initiative!

Also thanks to Ghos007, dima210121012101, dmitryAdams and ChatGPT for proofreading this post and giving useful feedback.

CDQ Divide & Conquer is an interesting algorithmic technique that nobody talks about (at least in Russian CP community). However, it is quite powerful, as it often provides unexpectedly clean solutions to seemingly tedious problems. In this tutorial, I will discuss different contexts where this method is useful. I also want to collect what I have seen about this technique on the Internet in one place and add my own insights. Credit to robert1003's blog where I first learned this method.

The main idea of this technique is to calculate the influence/contribution of one half to the other while dividing. Let’s see how this can be useful.

3D Queries

Problem. You are given $$$n$$$ points $$$(x_i, y_i, z_i)$$$ and $$$q$$$ queries $$$(a_i, b_i, c_i)$$$. For each query $$$i$$$, you need to count the number of points $$$j$$$ such that $$$x_j \lt a_i$$$, $$$y_j \lt b_i$$$ and $$$z_j \lt c_i$$$.

The 2D version of this problem is well known and can be solved using sweepline with Fenwick tree, but the 3D case seems tedious. With CDQ, however, it becomes much simpler.

As in a typical sweepline problem, define events $$$(x, y, z, \text{type})$$$, where $$$\text{type} = 1$$$ means "add a point $$$(x, y, z)$$$", $$$\text{type} = 2$$$ means "count the number of points with all coordinates smaller than $$$(x, y, z)$$$". Sort the events by $$$x$$$. If two events have the same $$$x$$$ value, sort them by type in descending order so that type 2 comes before type 1. This is needed to ensure that points with $$$x_j = a_i$$$ are not counted.

Then write a function $$$\text{solve3d}(l, r)$$$ that solves the problem as if only the events in the interval $$$[l, r)$$$ existed. The idea is simple:

If $$$r - l = 1$$$, return.
Solve the problem recursively for the left half: $$$\text{solve3d}(l, \text{mid})$$$
Compute the contribution of the left part to the right — that is, how update queries from the left affect calculation queries from the right. Details of this computation are explained below.
Solve the problem recursively for the right half: $$$\text{solve3d}(\text{mid}, r)$$$

The nice thing about partitioning into halves is that, for each "add point" $$$(x_i, y_i, z_i)$$$ event from the left part, we have $$$x_i \lt a_j$$$ for every "calculate" event $$$(a_j, b_j, c_j)$$$ in the right part because of the sorting. So for one dimension, the inequality holds automatically. The problem now reduces to this:

Problem (reduced). You are given $$$n$$$ points $$$(y_i, z_i)$$$ from the left half and $$$q$$$ queries $$$(b_i, c_i)$$$ from the right half. For each query $$$i$$$, count the number of points $$$j$$$ such that $$$y_j \lt b_i$$$ and $$$z_j \lt c_i$$$.

This is the standard 2D problem (for convenience, assume it's implemented in a $$$\text{solve2d}$$$ function). Once we computed the full contribution of the left half, we treat the halves as independent. The time complexity is $$$\mathcal{O}(n \log^2 n)$$$, since we perform an $$$\mathcal{O}(n \log n)$$$ sweepline on each of $$$\log n$$$ layers of D&C.

You can see the implementation of this algorithm in the next section.

Note that the algorithm above does not rely on the fact that each point contributes exactly $$$1$$$ to the answer. It would still work if points had arbitrary weights. Also you can query sums on arbitrary 3D ranges by doing inclusion-exclusion like in prefix sums.

Tasks for practice

APIO 2019 Street Lamps

JOISC 2019 Examination

JOI Final Round 2020 Fire

ROI Regional Stage 2026 Sliding Windows

Arbitrary-dimensional queries

The recursive algorithm above just drops one dimension. To handle a fourth dimension, for example, you only need to write a $$$\text{solve4d()}$$$ function that calls $$$\text{solve3d()}$$$ to compute the contribution.

This method generalizes to any number of dimensions, giving a time complexity of $$$\mathcal{O}(n \log^{d - 1} n)$$$ for $$$d$$$ dimensions.

Implementation

struct cdq_solver {
  struct query {
    array<int, 4> point;
    long long weight;
  };
 
  struct fenwick {
    int size;
    vector<long long> bit;
 
    fenwick(int n) : size(n), bit(n + 1) {}
 
    void update(int i, long long value) {
      for (i++; i <= size; i += i & -i) {
        bit[i] += value;
      }
    }
 
    long long query(int r) {
      long long answer = 0;
      for (; r > 0; r -= r & -r) {
        answer += bit[r];
      }
      return answer;
    }
  };
 
  vector<query> queries;
  vector<long long> result;
 
  void add_point(int a, int b, int c, int d, long long weight) {
    queries.push_back({{a, b, c, d}, weight});
  }
 
  void add_query(int a, int b, int c, int d) {
    queries.push_back({{a, b, c, d}, INF});
  }
 
  bool comp(int qa, int qb, int start) {
    for (int i = start; i < 4; i++) {
      if (queries[qa].point[i] != queries[qb].point[i]) {
        return queries[qa].point[i] < queries[qb].point[i];
      }
    }
    return queries[qa].weight > queries[qb].weight;
  }
 
  void solve2d(vector<int> query_ids) {
    sort(query_ids.begin(), query_ids.end(), [&](int x, int y) {
      return comp(x, y, 2);
    });
    vector<int> diff;
    for (int i : query_ids) {
      diff.push_back(queries[i].point[3]);
    }
    sort(diff.begin(), diff.end());
    diff.resize(unique(diff.begin(), diff.end()) - diff.begin());
    fenwick bit(diff.size());
    for (int i : query_ids) {
      int y = lower_bound(diff.begin(), diff.end(), queries[i].point[3]) - diff.begin();
      if (queries[i].weight == INF) {
        result[i] += bit.query(y + 1);
      } else {
        bit.update(y, queries[i].weight);
      }
    }
  }
 
  void solve3d(vector<int> query_ids) {
    if (query_ids.size() <= 1) {
      return;
    }
    sort(query_ids.begin(), query_ids.end(), [&](int x, int y) {
      return comp(x, y, 1);
    });
    int mid = (int)query_ids.size() / 2;
    solve3d(vector<int>(query_ids.begin(), query_ids.begin() + mid));
    solve3d(vector<int>(query_ids.begin() + mid, query_ids.end()));
    vector<int> query_mid;
    for (int i = 0; i < mid; i++) {
      if (queries[query_ids[i]].weight != INF) {
        query_mid.push_back(query_ids[i]);   
      }
    }
    for (int i = mid; i < (int)query_ids.size(); i++) {
      if (queries[query_ids[i]].weight == INF) {
        query_mid.push_back(query_ids[i]);
      }
    }
    solve2d(query_mid);
  }
 
  void solve4d(vector<int> query_ids) {
    if (query_ids.size() <= 1) {
      return;
    }
    sort(query_ids.begin(), query_ids.end(), [&](int x, int y) {
      return comp(x, y, 0);      
    });
    int mid = (int)query_ids.size() / 2;
    solve4d(vector<int>(query_ids.begin(), query_ids.begin() + mid));
    solve4d(vector<int>(query_ids.begin() + mid, query_ids.end()));
    vector<int> query_mid;
    for (int i = 0; i < mid; i++) {
      if (queries[query_ids[i]].weight != INF) {
        query_mid.push_back(query_ids[i]);   
      }
    }
    for (int i = mid; i < (int)query_ids.size(); i++) {
      if (queries[query_ids[i]].weight == INF) {
        query_mid.push_back(query_ids[i]);
      }
    }
    solve3d(query_mid);
  }
 
  vector<long long> solve() {
    vector<int> query_ids(queries.size());
    iota(query_ids.begin(), query_ids.end(), 0);
    result.resize(queries.size());
    solve4d(query_ids);
    vector<long long> answer;
    for (int i = 0; i < (int)query_ids.size(); i++) {
      if (queries[query_ids[i]].weight == INF) {
        answer.push_back(result[i]);
      }
    }
    return answer;
  }
};

Tasks for practice

You can try to apply 4D CDQ in IZhO 2020 Nasty Donchick, good luck :)

I didn't manage to fit 40 4-dimensional CDQs in the limit though even with heavy constant factor optimizations.

Update. Here is the explanation of my approach.

Generalization for min/max

By the way, this method works not only for sums but also for min/max. The only limitation is that you can compute min/max only over prefix ranges, not over general ones directly, because the operation is not invertible. There is a workaround though: just add more dimensions!

For example, suppose we want the minimum on $$$[x_l, x_r] \times [y_l, y_r] \times [z_l, z_r]$$$. Then we want points $$$(x, y, z)$$$ that satisfy $$$x_l \le x \le x_r$$$ and $$$y_l \le y \le y_r$$$ and $$$z_l \le z \le z_r$$$.

For the $$$z$$$-dimension, there's no such problem — we can update any segment in segment tree not just prefixes. For $$$x$$$ and $$$y$$$, however, we need to "double" them. That is, consider a 5D point $$$p = (x, x, y, y, z)$$$ and require $$$p_1 \ge x_l$$$, $$$p_2 \le x_r$$$, $$$p_3 \ge y_l$$$, $$$p_4 \le y_r$$$ and $$$z_l \le p_5 \le z_r$$$. Suffix queries can be handled similarly to prefix queries, for example, by multiplying the relevant coordinates by $$$-1$$$. The time complexity is $$$\mathcal{O}(n \log^4 n)$$$ because of 5-dimensional CDQ 💀

MEX on segment with modifications (offline)

Problem. You are given an array $$$a$$$ of $$$n$$$ integers. Also there are $$$q$$$ queries:

Type 1. Set $$$a_i := x$$$.
Type 2. Find $$$\text{MEX}(a_l, a_{l + 1}, \cdots, a_{r - 1}, a_r)$$$.

The idea is to maintain intervals where each value $$$x$$$ is excluded.

Formally, let the positions of $$$x$$$ be $$$p_1, p_2, \cdots, p_k$$$. Then, for each $$$i$$$, we create an interval $$$(p_i, p_{i + 1})$$$ with weight $$$x$$$. For convenience, we also add $$$(-1, p_1)$$$ and $$$(p_k, n)$$$. If $$$x$$$ does not appear in the array at all, we add a single interval $$$(-1, n)$$$. Now a query of type 2 can be answered by taking the minimum weight of all intervals fully covering $$$[l, r]$$$. Intuitively, if some occurrence of $$$x$$$ was before the left border $$$l$$$, and the next occurrence is after $$$r$$$, this means that $$$x$$$ does not appear in this segment.

Without modifications, this reduces the problem to "minimum in prefix subrectangle" queries: treat intervals as points and queries as subrectangles. The time complexity is $$$\mathcal{O}(n \log n)$$$.

To handle modifications, we need to update the set of intervals after each change. To do this, we store positions of each value $$$x$$$ in a std::vector of std::set’s. When processing a modification at position $$$i$$$, we remove $$$i$$$ from the set corresponding to the old value of $$$a_i$$$. Before the removal, $$$i$$$ splits one existing interval into two, so we delete the two intervals that had $$$i$$$ as an endpoint and add the merged interval formed by its neighbors. Then we set $$$a_i := x$$$ and insert $$$i$$$ into the set for the new value. This insertion splits one existing interval into two, so we delete the merged interval and add the two new intervals created by inserting $$$i$$$.

The main difficulty is that now the set of intervals is dynamic, i.e., intervals may be added or removed as the algorithm runs. The idea is to maintain the "time of life" of each object. That is, each interval is now described by four integers: $$$(l, r, T_l, T_r)$$$. This means that the interval was added at query with index $$$T_l$$$ (or $$$-1$$$ if it existed from the beginning) and was removed in query with index $$$T_r$$$ (or $$$q$$$ if it was never removed). Note that we describe a query by a triple $$$(l, r, T)$$$ now. The problem reduces to the following:

Problem (reduced). You are given $$$n$$$ points $$$(l_i, r_i)$$$ with weights $$$w_i$$$. Each point $$$i$$$ appears at time $$$lt_i$$$ and disappears at time $$$rt_i$$$. Also there are $$$q$$$ queries $$$(lq_i, rq_i)$$$. For each query, find the minimum weight of all points $$$j$$$ such that:

$$$l_j \lt lq_i$$$
$$$r_j \gt rq_i$$$
$$$j$$$ is "alive" at time $$$i$$$. Formally, $$$lt_j \le i \le rt_j$$$

At first glance, this looks like a four-dimensional problem, since each point is described by four parameters (and weight). However, there is a way to avoid this.

Run CDQ on events sorted by $$$l_i$$$ / $$$lq_i$$$. When computing the contributions, the inequality on left borders is satisfied automatically, so we can drop this dimension. Then run a sweepline over time. For each point, create two events: an add event at time $$$lt_i$$$ and a remove event at time $$$rt_i$$$. Also create one event for each query. When processing an add event for point $$$i$$$, insert weight $$$w_i$$$ at position $$$r_i$$$ in a segment tree. When processing a remove event, replace this value with $$$+\infty$$$. For each query $$$i$$$, query the minimum on the suffix $$$(rq_i, +\infty)$$$ of the segment tree.

For correctness, right borders must be unique. Otherwise, removing one point could incorrectly delete another point with the same $$$r$$$. This is easy to fix by compressing $$$r_i$$$ values and assigning unique indices (by sorting and using their order). The time complexity is $$$\mathcal{O}(n \log^2 n)$$$ because of three-dimensional queries.

Implementation

#include <bits/stdc++.h>
using namespace std;

struct query {
  int l, r, lt, rt, value;
  int pos;
};

struct modification {
  int l, r, i, t;
};

int Min(int a, int b) {
  return a < b ? a : b;
}

struct segment_tree {
  int n;
  vector<int> tree;
  vector<int> updates;

  segment_tree() {}

  segment_tree(int _n) : n(_n), tree(_n * 2, INT_MAX) {}

  void modify(int i, int v) {
    for (updates.push_back(i += n), tree[i] = v; i > 1; i >>= 1) {
      updates.push_back(i >> 1);
      tree[i >> 1] = Min(tree[i], tree[i ^ 1]);
    }
  }

  int query(int l, int r) {
    int ans = INT_MAX;
    for (l += n, r += n; l < r; l >>= 1, r >>= 1) {
      if (l & 1) {
        ans = Min(ans, tree[l++]);
      }
      if (r & 1) {
        ans = Min(ans, tree[--r]);
      }
    }
    return ans;
  }

  void rollback() {
    for (int x : updates) {
      tree[x] = INT_MAX;
    }
    updates.clear();
  }
};

segment_tree ST;

void CDQ(vector<query>& queries, vector<int>& answers, int L, int R) {
  if (R - L == 1) {
    return;
  }
  int M = (L + R) / 2;
  CDQ(queries, answers, L, M);
  CDQ(queries, answers, M, R);
  vector<array<int, 4>> events;
  for (int i = L; i < M; i++) {
    if (queries[i].pos == -1) {
      events.push_back({queries[i].lt, -1, queries[i].r, queries[i].value});
      events.push_back({queries[i].rt, 1, queries[i].r, queries[i].value});
    }
  }
  for (int i = M; i < R; i++) {
    if (queries[i].pos != -1) {
      events.push_back({queries[i].lt, 0, queries[i].r, queries[i].pos});
    }
  }
  sort(events.begin(), events.end());
  for (auto [x, type, r, value] : events) {
    if (type == -1) {
      ST.modify(r, value);
    } else if (type == 0) {
      answers[value] = min(answers[value], ST.query(r, ST.n));
    } else {
      ST.modify(r, INT_MAX);
    }
  }
  ST.rollback();
}

int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int n, q;
  cin >> n >> q;
  vector<int> a(n);
  for (int &v : a) {
    cin >> v;
  }
  vector<set<int>> pos(n + 1);
  for (int i = 0; i < n; i++) {
    pos[a[i]].insert(i);
  }
  vector<modification> mods;
  int T = 0;
  for (int i = 0; i <= n; i++) {
    int last = -1;
    pos[i].insert(-1);
    pos[i].insert(n);
    for (auto it = next(pos[i].begin()); it != pos[i].end(); it++) {
      int j = *it;
      mods.push_back({last, j, i, T});
      last = j;
    }
    mods.push_back({last, n, i, T});
  }
  vector<query> queries;
  int answer_c = 0;
  while (q--) {
    T++;
    char type;
    cin >> type;
    if (type == '?') {
      int l, r;
      cin >> l >> r;
      l--, r--;
      queries.push_back({l - 1, r + 1, T, T, -1, answer_c++});
    } else {
      int i, x;
      cin >> i >> x;
      i--;
      auto it = pos[a[i]].lower_bound(i);
      mods.push_back({*prev(it), *it, a[i], T});
      mods.push_back({*it, *next(it), a[i], T});
      mods.push_back({*prev(it), *next(it), a[i], T});
      pos[a[i]].erase(i);
      a[i] = x;
      pos[a[i]].insert(i);
      it = pos[a[i]].lower_bound(i);
      mods.push_back({*prev(it), *next(it), a[i], T});
      mods.push_back({*prev(it), *it, a[i], T});
      mods.push_back({*it, *next(it), a[i], T});
    }
  }
  sort(mods.begin(), mods.end(), [](modification a, modification b) {
    return a.l != b.l ? a.l < b.l : (a.r != b.r ? a.r < b.r : (a.i != b.i ? a.i < b.i : a.t < b.t));
  });
  T++;
  for (int i = 0; i < (int)mods.size(); i++) {
    int j = i;
    vector<int> times;
    while (j < (int)mods.size() && mods[i].l == mods[j].l && mods[i].r == mods[j].r && mods[i].i == mods[j].i) {
      times.push_back(mods[j++].t);
    }
    times.push_back(T);
    for (int k = 0; k < (int)times.size() - 1; k += 2) {
      queries.push_back({mods[i].l, mods[i].r, times[k], times[k + 1], mods[i].i, -1});
    }
    i = j - 1;
  }
  sort(queries.begin(), queries.end(), [](query a, query b) {
    return a.r != b.r ? a.r < b.r : a.pos > b.pos;
  });
  int last = -1;
  for (int i = 0; i < (int)queries.size(); i++) {
    int j = i;
    while (j < (int)queries.size() && queries[i].r == queries[j].r) {
      last = (queries[j].r = max(last + 1, queries[j].r));
      j++;
    }
    i = j - 1;
  }
  ST = segment_tree(last + 1);
  sort(queries.begin(), queries.end(), [](query a, query b) {
    return a.l != b.l ? a.l < b.l : (a.r != b.r ? a.r > b.r : (a.lt != b.lt ? a.lt < b.lt : a.rt > b.rt));
  });
  vector<int> answers(answer_c, INT_MAX);
  CDQ(queries, answers, 0, (int)queries.size());
  for (int value : answers) {
    cout << value << "\n";
  }
}

Tasks for practice

Long tour of Moscow Open Olympiad 2018-2019, problem "Dima and Array"

Online FFT

Problem. You are given arrays $$$a$$$, $$$b$$$ and $$$r$$$, all of length $$$n$$$. You need to calculate a recurrent array $$$dp_k = a_k \cdot \sum_{i + j = k}{(dp_i \cdot r_j)} + b_k$$$ also of length $$$n$$$.

The problem is that this recurrence depends on its previous terms so we can't solve it by just multiplying polynomials. Once again, we can apply CDQ on this DP.

Suppose we are computing the contribution of DP's in the range $$$[L, M)$$$ to the range $$$[M, R)$$$ (and we have already computed the DP values in the range $$$[L; M)$$$). For now, let's ignore $$$a_k$$$'s and $$$b_k$$$'s for a moment — we will apply the transform $$$dp_k := a_k \cdot dp_k + b_k$$$ inside the $$$\text{solve}(k, k + 1)$$$ call later. At this stage, we treat unprocessed $$$dp_k$$$ as $$$\sum_{i + j = k}{dp_i \cdot r_j}$$$. For our current ranges $$$[L; M)$$$ and $$$[M; R)$$$, what values of $$$i$$$ and $$$j$$$ are actually relevant for computing the influence? Exactly those that form cross contributions: $$$i \in [L; M)$$$ and $$$j \in [0; R - L)$$$. The second bound holds because the smallest contributing index is $$$i = L$$$ and the largest receiving index is $$$k = R - 1$$$, so the maximum possible difference is $$$(R - 1) - L \lt R - L$$$.

We can then use FFT to multiply $$$DP[L:M]$$$ with $$$r[0:R - L]$$$ (here ":" denotes slice as in Python). Then add the results of multiplications to the DP's from the right half. The time complexity is $$$\mathcal{O}(n \log^2 n)$$$ because we convolve arrays of total size $$$\mathcal{O}(n)$$$ in $$$\mathcal{O}(n \log n)$$$ time on each of $$$\mathcal{O(\log n)}$$$ recursion layers.

Implementation for a similar problem (without array a)

#include <bits/stdc++.h>
using namespace std;

const int MOD = 998244353;
vector<int> c, r, dp;

int power(int x, int y) {
  int result = 1;
  for (int base = x; y > 0; y /= 2, base = 1LL * base * base % MOD) {
    if (y & 1) {
      result = 1LL * result * base % MOD;
    }
  }
  return result;
}

void FFT(vector<int>& A, bool inverse = false) {
  int N = 1;
  while (N < (int)A.size()) {
    N *= 2;
  }
  A.resize(N);
  vector<int> rev_index(N);
  for (int len = 1; len < N; len *= 2) {
    for (int i = 0; i < len; i++) {
      rev_index[i] *= 2;
      rev_index[i + len] = rev_index[i] + 1;
    }
  }
  for (int i = 0; i < N; i++) {
    if (i < rev_index[i]) {
      swap(A[i], A[rev_index[i]]);
    }
  }
  for (int len = 1; len < N; len *= 2) {
    int omega = power(3, (MOD - 1) / (2 * len));
    for (int i = 0; i < N; i += len * 2) {
      int mult = 1;
      for (int j = 0; j < len; j++, mult = 1LL * mult * omega % MOD) {
        int x = A[i + j], y = 1LL * A[i + j + len] * mult % MOD;
        A[i + j] = x + y - (x + y >= MOD ? MOD : 0);
        A[i + j + len] = x - y + (x - y < 0 ? MOD : 0);
      }
    }
  }
  if (inverse) {
    reverse(A.begin() + 1, A.end());
    int inv = power(N, MOD - 2);
    for (int i = 0; i < N; i++) {
      A[i] = 1LL * A[i] * inv % MOD;
    }
  }
}

void CDQ(int L, int R) {
  if (R - L == 1) {
    dp[L] += c[L];
    if (dp[L] >= MOD) {
      dp[L] -= MOD;
    }
    return;
  }
  int mid = (L + R) / 2;
  CDQ(L, mid);
  vector<int> DP = vector<int>(dp.begin() + L, dp.begin() + mid);
  vector<int> A = vector<int>(r.begin(), r.begin() + R - L);
  DP.resize(2 * (R - L)), A.resize(2 * (R - L));
  FFT(DP);
  FFT(A);
  for (int i = 0; i < (int)DP.size(); i++) {
    DP[i] = 1LL * DP[i] * A[i] % MOD;
  }
  FFT(DP, true);
  for (int i = mid - L; i < 2 * (R - L) && L + i < R; i++) {
    dp[L + i] += DP[i];
    if (dp[L + i] >= MOD) {
      dp[L + i] -= MOD;
    }
  }
  CDQ(mid, R);
}

int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int N;
  cin >> N;
  c.resize(N), r.resize(N);
  dp.resize(N);
  for (int &value : c) {
    cin >> value;
  }
  for (int &value : r) {
    cin >> value;
  }
  CDQ(0, N);
  for (int i = 0; i < N; i++) {
    cout << dp[i] << " \n"[i + 1 == N];
  }
}

Tasks for practice

AtCoder Beginner Contest 213, Problem H

JOISC 2023 Festival

CDQ + SOS DP

This is similar to the previous section. Let's look at the following DP:

Problem. You are given arrays $$$a$$$, $$$b$$$ and $$$r$$$, all of length $$$2^n$$$. You need to calculate recurrence $$$dp_i = a_i \cdot \sum_{j \subset i}{dp_j} + b_j$$$. That is, for each mask we take the sum of DPs of all submasks, multiply by given constant and add another constant.

The idea is simple: run CDQ D&C on masks. Again, we want to compute the influence of $$$[L; M)$$$ on $$$[M; R)$$$. The binary representations of numbers in the first half look like [LCP]0[anything] and in the second half [LCP]1[anything]. Here, LCP means longest common prefix. Let its length be $$$k$$$. When computing the influence, note that the submask condition automatically holds for the first $$$k + 1$$$ bits. On the remaining bits, it becomes a standard SOS DP on an array of size $$$2^{n - k - 1}$$$.

So the method is: recursively compute the left half, run SOS DP to propagate its contribution, then recursively compute the right half. At a leaf, apply $$$dp_i := a_i \cdot dp_i + b_i$$$. The time complexity is $$$\mathcal{O}(2^n \cdot n^2)$$$, because we do $$$O(2^n \cdot n)$$$ work per level and there are $$$\log_2(2^n) = n$$$ levels.

Update. It turns out that there is a faster solution with time complexity $$$\mathcal{O}(2^n \cdot n)$$$. Thanks to Misuki for describing this approach in the comments. In short, the following code works:

Code

vector<int> dp; // answer
vector<int> sos; // sos[i] = sum over all submasks of i (including i)

void CDQ(int lcp, int bit) {
  if (bit == -1) {
    dp[lcp] = a[lcp] * sos[lcp] + b[lcp];
    sos[lcp] += dp[lcp];
    return;
  }
  int mid = lcp + (1 << bit);
  for (int i = lcp; i < mid; i++) {
    sos[i + (1 << bit)] -= sos[i]; // cancel the contribution of current bit
  }
  CDQ(lcp, bit - 1);
  for (int i = lcp; i < mid; i++) {
    sos[i + (1 << bit)] += sos[i]; // add the contribution of current bit
  }
  CDQ(lcp | (1 << bit), bit - 1);
}

It is claimed that $$$\text{sos}[i]$$$ is computed correctly when we are at position $$$i$$$. To prove this, consider all possible LCPs of $$$i$$$ with its submasks. One can show that we only add contributions from positions that differ from $$$i$$$ exactly at the first bit after the LCP.

The key idea is that by canceling and then re-adding the contribution, we are effectively adding the delta between the new version and the old version. This delta contains only those submasks whose contribution has changed at this recursion step. In particular, only submasks with the given LCP are affected, because all other submasks remain unchanged and therefore cancel out. As a result, no incorrect contributions are introduced, and $$$\text{sos}[i]$$$ is correct when processing $$$i$$$.

Tasks for practice

USACO 2023 Feb Platinum, Problem Setting.

AtCoder Regular Contest 168, Problem E (thanks to Misuki!)

CDQ on Graph Trick

The following 1D problem is well known. I assume you already know how to solve it:

Problem (1D). You are given a directed graph on $$$n$$$ vertices. Initially, there are $$$m$$$ edges $$$(u_i, v_i)$$$. Then you need to process queries. Each query gives three integers $$$v$$$, $$$l$$$ and $$$r$$$ and asks you to add edges $$$(v, l), (v, l + 1), \cdots, (v, r - 1), (v, r)$$$. After all queries, find the shortest path between $$$s$$$ and $$$t$$$.

This can be solved in $$$\mathcal{O}(n \log n)$$$ using segment tree on graph trick. But there is an even harder problem that is still solvable using this trick:

Problem (2D). You are given a directed graph on $$$n$$$ vertices and an array $$$a$$$. Initially there are $$$m$$$ edges $$$(u_i, v_i)$$$. Then you need to process queries. Each query gives four integers $$$v$$$, $$$l$$$, $$$r$$$ and $$$x$$$ and asks you to add edges $$$(v, i)$$$ for each $$$i$$$ such that $$$l \le i \le r$$$ and $$$a_i \lt x$$$. After all queries, find the shortest path between $$$s$$$ and $$$t$$$.

Let's use persistent segment tree on graph trick to solve this problem. The idea is to do a sweepline where we sort by $$$a_i$$$ / $$$x$$$. We can treat $$$a_i$$$ as the "time" when $$$i$$$-th element appears (activates). Then, there are two types of events:

Activate $$$i$$$-th position
Add an edge from given node $$$v$$$ to all already activated nodes in range $$$[l; r]$$$

Initially there are no virtual nodes (as in implicit segment tree). For the queries of the first type, add the path from the root to $$$i$$$ in the segment tree. If some node from this path already exists, add a link from the new version to the old one. For the queries of the second type, as in the 1D problem, add edges from $$$v$$$ to the latest versions of nodes of segment tree that form range $$$[l; r]$$$. The copying of versions is needed to ensure that we don't break old second type queries by adding a new position that was not activated at that moment. The time complexity is $$$\mathcal{O}(n \log n)$$$.

Moreover, we can solve even the 3D version:

Problem (3D). You are given a directed graph on $$$n$$$ vertices and arrays $$$a$$$ and $$$b$$$. Initially there are $$$m$$$ edges $$$(u_i, v_i)$$$. Then you need to process queries. Each query gives you five integers $$$v$$$, $$$l$$$, $$$r$$$, $$$x$$$, $$$y$$$, and and asks you to add edges $$$(v, i)$$$ for each $$$i$$$ such that $$$l \le i \le r$$$, $$$a_i \lt x$$$ and $$$b_i \lt y$$$. After all queries, find the shortest path between $$$s$$$ and $$$t$$$.

The solution is to do CDQ on $$$b_i$$$, build a persistent segment tree on the nodes from the left half and use the 2D solution with queries from the right half (that have greater $$$y$$$'s than $$$b_i$$$'s from the left). The time complexity is $$$\mathcal{O}(n \log^2 n)$$$.

Of course, this algorithm generalizes to arbitrary number of dimensions $$$d$$$ by using $$$(d - 1)$$$-dimensional CDQ inside of $$$d$$$-dimensional CDQ.

Tasks for practice

I am not aware of any problems in which you can use the 3D algorithm. Here are some problems where you can use the 2D method:

JOISC 2020 Treatment Project

JOISC 2020 Capital City

Комментарии (15)

Написать комментарий?

tch1cherin

2 месяца назад, скрыть # |

Auto comment: topic has been updated by tch1cherin (previous revision, new revision, compare).

→ Ответить

DNR

does CDQ stand for Chinese Divide and conQuer?

2 месяца назад, скрыть # ^ |

-18

Chinese Divide & Qonquer

+16

It stands for Danqi Chen

Lobotomy_Corporation_Fan

Absolute cinema

Misuki

+26

Thanks for introducing the application of CDQ in various task, I learned a lot from this blog!

And about the CDQ + SOS part, actually it's possible to solve it in just $$$O(n2^n)$$$, the idea is to maintain the $$$z(dp)$$$ "dynamically" while calculating $$$dp$$$. ($$$z(dp)$$$ stands for the array produced by doing SOS on array $$$dp$$$, i.e. $$$z(dp)_i = \sum\limits_{j \subseteq i}dp_j$$$).

In other word, we want to solve the following problem:

Given an $$$0$$$-indexed array $$$dp$$$ of length $$$2^n$$$, initially filled with $$$0$$$ except $$$dp_0$$$.
For $$$i = 1, 2, \ldots, 2^n - 1$$$ in this order, do the following operation:

query $$$\sum\limits_{j \subset i}dp_j$$$

add $$$f(\sum\limits_{j \subset i}dp_j)$$$ to $$$dp_i$$$

To solve it, we maintain $$$z(dp)$$$ dynamically during the recursion call of CDQ. Since $$$dp_0$$$ is the only non-zero element of the initial array, $$$z(dp)$$$ should filled with $$$dp_0$$$ initially.

And when we have a fixed "LCP" of length $$$L$$$ and ready to call $$$\text{LCP+0+...}$$$, we do an inverse SOS on subarray $$$z(dp)[\text{LCP+0...0}, \text{LCP+1...1}]$$$ to invert the effect of $$$(N-1-L)$$$-th bit, i.e. do $$$z(dp)_{(i | 2^{N - L - 1})} \mathrel{{-}{=}} z(dp)_i$$$ for all $$$i$$$ match with $$$\text{LCP+0+...}$$$

Then recurse into $$$\text{LCP+0+...}$$$, add some elements to $$$dp$$$, and when the call to $$$\text{LCP+0+...}$$$ is done, we do an SOS on the same subarray to add the effect of $$$(N-1-L)$$$-th bit back.

Then whenever we reach a leaf node $$$l$$$ of the recursion, $$$z(dp)_l = \sum\limits_{i \subset l}dp_i$$$ would hold so we can answer the above query in $$$O(1)$$$.

And the total complexity would be the sum of size of all subtree of the recursion tree, which is $$$O(n2^n)$$$.

This is my implementation of this technique on a similar problem

dinohaur

Nice blog!

About 1D graph problem you can do it in $$$O(n\alpha(n)+m+q)$$$ (if edge weights are $$$1$$$). Just do the BFS and when you want to see which indices in the interval $$$[l, r]$$$ are not marked, mark them. You can do it fast with DSU where each group is an interval (for each group maintain their rightmost position).

Similarly, for 2D graph problem you can do it in $$$O(n log n)$$$ with $$$O(n)$$$ memory by doing BFS and checking unmarked elements with segment tree. In segment tree you maintain $$$a_i$$$ if not marked and $$$\infty$$$ if marked. Checking if there exists unmarked element with value $$$a_u \lt a_{cur}$$$ which you can do with query_min operation.

yoshi_avx

This made me realize how little I know about CDQ... Mind blown

Also... Can you explain the "4D CDQ" solution for the problem? I'm interested

Spoiler

The problem basically asks us to count triples $$$(i, j, k)$$$ such that the multiset of elements $$$a_i, a_{i + 1}, \cdots, a_{j - 1}, a_j$$$ is equal to the multiset of $$$a_{j + 1}, a_{j + 2}, \cdots, a_{k - 1}, a_k$$$.

In this kind of problem, it's quite natural to try computing something like $$$\text{prv}_i$$$ and $$$\text{nxt}_i$$$ — previous and next occurrences of $$$a_i$$$ respectively — and then write down some inequalities in terms of them. The key observation is that we can reformulate the original condition as the following inequalities:

$$$ \begin{cases} \max\limits_{p \in [i; j]} \text{nxt}_p \le k \\ \min\limits_{p \in (j; k]} \text{prv}_p \ge i \end{cases} $$$

Fix $$$x$$$ and $$$y$$$ as the positions of the maximum and minimum in the inequalities above, respectively, taking the first occurrence if there are multiple maximums or minimums in the segment. Then we want to count the number of ways to choose $$$i$$$, $$$j$$$, $$$k$$$ for these $$$x$$$ and $$$y$$$. Of course, the choice of $$$x$$$ and $$$y$$$ imposes some restrictions.

Let $$$\text{ln}_i$$$, $$$\text{rn}_i$$$ be the nearest positions $$$j$$$ to the left and right where $$$\text{nxt}_j \ge \text{nxt}_i$$$ and $$$\text{nxt}_j \gt \text{nxt}_i$$$, respectively. Similarly, define $$$\text{lp}_i$$$, $$$\text{rp}_i$$$ for the $$$\text{prv}$$$. Then the following conditions should hold:

$$$ \begin{cases} i, j \in (\text{ln}_x, \text{rn}_x) \\ j + 1, k \in (\text{lp}_y, \text{rp}_y) \end{cases} $$$

It is also known that $$$j \in [i; k)$$$.

Let's intersect all the inequalities now. $$$j$$$ should be in the range from $$$\max(x, \text{lp}_y)$$$ to $$$\min(\text{rn}_x, y)$$$, $$$i$$$ should be in the range from $$$\text{ln}_x$$$ to $$$\text{prv}_y$$$ ($$$\text{prv}_y$$$ is the minimum value that we fixed), $$$k$$$ should be in the range from $$$\text{nxt}_x$$$ to $$$\text{rp}_y$$$. Multiply all the lengths and we get the number of ways to choose $$$i$$$, $$$j$$$, $$$k$$$. Here is a sample naive implementation of the approach above that works in $$$\mathcal{O}(n^2)$$$:

#pragma GCC optimize("Ofast,unroll-loops")
#include <bits/stdc++.h>
#pragma GCC target("avx2,tune=native")
using namespace std;

int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int N;
  cin >> N;
  vector<int> A(N);
  for (int &value : A) {
    cin >> value;
    value--;
  }
  vector<int> occ(N), prv(N), nxt(N);
  fill(occ.begin(), occ.end(), -1);
  for (int i = 0; i < N; i++) {
    prv[i] = occ[A[i]];
    occ[A[i]] = i;
  }
  fill(occ.begin(), occ.end(), N);
  for (int i = N - 1; i >= 0; i--) {
    nxt[i] = occ[A[i]];
    occ[A[i]] = i;
  }
  vector<int> ln(N), rn(N), lp(N), rp(N);
  stack<int> st_n, st_p;
  for (int i = 0; i < N; i++) {
    while (!st_n.empty() && nxt[i] > nxt[st_n.top()]) {
      st_n.pop();
    }
    while (!st_p.empty() && prv[i] < prv[st_p.top()]) {
      st_p.pop();
    }
    ln[i] = st_n.empty() ? -1 : st_n.top();
    lp[i] = st_p.empty() ? -1  : st_p.top();
    st_n.push(i);
    st_p.push(i);
  }
  st_n = st_p = stack<int>();
  for (int i = N - 1; i >= 0; i--) {
    while (!st_n.empty() && nxt[i] >= nxt[st_n.top()]) {
      st_n.pop();
    }
    while (!st_p.empty() && prv[i] <= prv[st_p.top()]) {
      st_p.pop();
    }
    rn[i] = st_n.empty() ? N : st_n.top();
    rp[i] = st_p.empty() ? N : st_p.top();
    st_n.push(i);
    st_p.push(i); 
  }
  long long answer = 0;
  for (int x = 0; x < N; x++) {
    for (int y = 0; y < N; y++) {
      answer += 1LL * max(0, min(rn[x], y) - max(x, lp[y])) * max(0, prv[y] - ln[x]) * max(0, rp[y] - nxt[x]);
    }
  }
  cout << answer << "\n";
}

The interesting part of the code above is this:


long long answer = 0;
for (int x = 0; x < N; x++) {
  for (int y = 0; y < N; y++) {
    answer += 1LL * max(0, min(rn[x], y) - max(x, lp[y])) * max(0, prv[y] - ln[x]) * max(0, rp[y] - nxt[x]);
  }
}

To apply 4D CDQ, we can just fix one parameter (for example, $$$y$$$) and consider the contribution of all $$$x$$$. We should always have $$$\text{prv}_y \gt \text{ln}_x$$$ and $$$\text{rp}_y \gt \text{nxt}_x$$$. We also do some casework: consider how $$$\min(\text{rn}_x, y)$$$ is expanded (there are two cases, depending on what is smaller); similarly for $$$\max(x, \text{lp}_y)$$$. In total there are 4 cases, in each of them we impose restrictions on $$$\text{rn}_x$$$, $$$x$$$, $$$\text{ln}_x$$$, $$$\text{nxt}_x$$$ (4D). In each of the cases you expand the brackets, isolate the coefficients of $$$\text{nxt}_y$$$, $$$\text{ln}_y$$$, $$$\text{ln}_y \cdot \text{nxt}_y$$$ (multipliers that depend on $$$y$$$) and compute them separately. There are 40 4D CDQs in total in my implementation:

Implementation (with optimized CDQ)

#pragma GCC optimize("Ofast,unroll-loops")
#include <bits/stdc++.h>
#pragma GCC target("avx2,tune=native")
using namespace std;

constexpr long long INF = numeric_limits<long long>::min();

struct cdq_solver {
  struct query {
    array<int, 4> point;
    long long weight;
  };

  struct fenwick {
    int size;
    vector<long long> bit;

    fenwick(int n) : size(n), bit(n + 1) {}

    void update(int i, long long value) {
      for (i++; i <= size; i += i & -i) {
        bit[i] += value;
      }
    }

    long long query(int r) {
      long long answer = 0;
      for (; r > 0; r -= r & -r) {
        answer += bit[r];
      }
      return answer;
    }
  };

  vector<query> queries;
  vector<long long> result;
  int max_n;
  fenwick BIT;

  cdq_solver() = default;

  cdq_solver(int max_n_) : max_n(max_n_), BIT(max_n_ * 2) {}

  void add_point(int a, int b, int c, int d, long long weight) {
    queries.push_back({{a, b, c, d}, weight});
  }

  void add_query(int a, int b, int c, int d) {
    queries.push_back({{a, b, c, d}, INF});
  }

  bool comp(int qa, int qb, int start) {
    for (int i = start; i < 4; i++) {
      if (queries[qa].point[i] != queries[qb].point[i]) {
        return queries[qa].point[i] < queries[qb].point[i];
      }
    }
    return queries[qa].weight > queries[qb].weight;
  }

  inline void solve2d(vector<int>& query_ids) {
    for (int i : query_ids) {
      if (queries[i].weight == INF) {
        result[i] += BIT.query(queries[i].point[3] + max_n + 1);
      } else {
        BIT.update(queries[i].point[3] + max_n, queries[i].weight);
      }
    }
    for (int i : query_ids) {
      if (queries[i].weight != INF) {
        BIT.update(queries[i].point[3] + max_n, -queries[i].weight);
      }
    }
  }

  inline void solve3d(vector<int>& query_ids) {
    int N = (int)query_ids.size();
    vector<int> sorted_ids(N);
    iota(sorted_ids.begin(), sorted_ids.end(), 0);
    for (int len = 1; len < N; len *= 2) {
      for (int i = 0; i + len < N; i += len * 2) {   
        int R = min(i + 2 * len, N);
        inplace_merge(sorted_ids.begin() + i, sorted_ids.begin() + i + len,
                      sorted_ids.begin() + R, [&](int x, int y) {
          return comp(query_ids[x], query_ids[y], 2);
        });
        vector<int> new_query_ids;
        for (int j = i; j < R; j++) {
          int fx = (sorted_ids[j] < i + len);
          int fy = (queries[query_ids[sorted_ids[j]]].weight == INF);
          if (fx ^ fy) {
            new_query_ids.push_back(query_ids[sorted_ids[j]]);
          }
        }
        if (new_query_ids.size() <= 8) {
          for (int x = 0; x < (int)new_query_ids.size(); x++) {
            if (queries[new_query_ids[x]].weight == INF) {
              for (int y = 0; y < x; y++) {
                if (queries[new_query_ids[y]].weight != INF) {
                  bool good = true;
                  for (int z = 2; z < 4; z++) {
                    if (queries[new_query_ids[y]].point[z] > queries[new_query_ids[x]].point[z]) {
                      good = false;
                    }
                  }
                  if (good) {
                    result[new_query_ids[x]] += queries[new_query_ids[y]].weight;
                  }
                }
              }
            }
          }
        } else {
          solve2d(new_query_ids);
        }
      }
    }
  }

  void solve4d() {
    int N = (int)queries.size();
    vector<int> query_ids(N);
    iota(query_ids.begin(), query_ids.end(), 0);
    sort(query_ids.begin(), query_ids.end(), [&](int x, int y) {
      return comp(x, y, 0);
    });
    vector<int> sorted_ids(N);
    iota(sorted_ids.begin(), sorted_ids.end(), 0);
    for (int len = 1; len < N; len *= 2) {
      for (int i = 0; i + len < N; i += len * 2) {   
        int R = min(i + 2 * len, N);
        inplace_merge(sorted_ids.begin() + i, sorted_ids.begin() + i + len,
                      sorted_ids.begin() + R, [&](int x, int y) {
          return comp(query_ids[x], query_ids[y], 1);
        });
        vector<int> new_query_ids;
        for (int j = i; j < R; j++) {
          int fx = (sorted_ids[j] < i + len);
          int fy = (queries[query_ids[sorted_ids[j]]].weight == INF);
          if (fx ^ fy) {
            new_query_ids.push_back(query_ids[sorted_ids[j]]);
          }
        }
        if (new_query_ids.size() <= 32) {
          for (int x = 0; x < (int)new_query_ids.size(); x++) {
            if (queries[new_query_ids[x]].weight == INF) {
              for (int y = 0; y < x; y++) {
                if (queries[new_query_ids[y]].weight != INF) {
                  bool good = true;
                  for (int z = 1; z < 4; z++) {
                    if (queries[new_query_ids[y]].point[z] > queries[new_query_ids[x]].point[z]) {
                      good = false;
                    }
                  }
                  if (good) {
                    result[new_query_ids[x]] += queries[new_query_ids[y]].weight;
                  }
                }
              }
            }
          }
        } else {
          solve3d(new_query_ids);
        }
      }
    }
  }

  vector<long long> solve() {
    result.resize(queries.size());
    solve4d();
    vector<long long> answer;
    for (int i = 0; i < (int)queries.size(); i++) {
      if (queries[i].weight == INF) {
        answer.push_back(result[i]);
      }
    }
    return answer;
  }
};

int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int N;
  cin >> N;
  vector<int> A(N);
  for (int &value : A) {
    cin >> value;
    value--;
  }
  vector<int> occ(N), prv(N), nxt(N);
  fill(occ.begin(), occ.end(), -1);
  for (int i = 0; i < N; i++) {
    prv[i] = occ[A[i]];
    occ[A[i]] = i;
  }
  fill(occ.begin(), occ.end(), N);
  for (int i = N - 1; i >= 0; i--) {
    nxt[i] = occ[A[i]];
    occ[A[i]] = i;
  }
  vector<int> ln(N), rn(N), lp(N), rp(N);
  stack<int> st_n, st_p;
  for (int i = 0; i < N; i++) {
    while (!st_n.empty() && nxt[i] > nxt[st_n.top()]) {
      st_n.pop();
    }
    while (!st_p.empty() && prv[i] < prv[st_p.top()]) {
      st_p.pop();
    }
    ln[i] = st_n.empty() ? -1 : st_n.top();
    lp[i] = st_p.empty() ? -1  : st_p.top();
    st_n.push(i);
    st_p.push(i);
  }
  st_n = st_p = stack<int>();
  for (int i = N - 1; i >= 0; i--) {
    while (!st_n.empty() && nxt[i] >= nxt[st_n.top()]) {
      st_n.pop();
    }
    while (!st_p.empty() && prv[i] <= prv[st_p.top()]) {
      st_p.pop();
    }
    rn[i] = st_n.empty() ? N : st_n.top();
    rp[i] = st_p.empty() ? N : st_p.top();
    st_n.push(i);
    st_p.push(i); 
  }
  long long answer = 0;
  {
    cdq_solver CDQ(N + 8), CDQ_ln(N + 8), CDQ_nxt(N + 8), CDQ_ln_nxt(N + 8);
    for (int i = 0; i < N; i++) { 
      CDQ.add_point(ln[i], nxt[i], rn[i], -i, rn[i] - i);
      CDQ_ln.add_point(ln[i], nxt[i], rn[i], -i, -1LL * ln[i] * (rn[i] - i));
      CDQ_nxt.add_point(ln[i], nxt[i], rn[i], -i, -1LL * nxt[i] * (rn[i] - i));
      CDQ_ln_nxt.add_point(ln[i], nxt[i], rn[i], -i, 1LL * ln[i] * nxt[i] * (rn[i] - i));
    }
    for (int i = 0; i < N; i++) {
      CDQ.add_query(prv[i], rp[i], i - 1, -lp[i]);
      CDQ_ln.add_query(prv[i], rp[i], i - 1, -lp[i]);
      CDQ_nxt.add_query(prv[i], rp[i], i - 1, -lp[i]);
      CDQ_ln_nxt.add_query(prv[i], rp[i], i - 1, -lp[i]);
    }
    auto res = CDQ.solve();
    auto res_ln = CDQ_ln.solve();
    auto res_nxt = CDQ_nxt.solve();
    auto res_ln_nxt = CDQ_ln_nxt.solve();
    for (int i = 0; i < N; i++) {
      answer += res[i] * prv[i] * rp[i];
      answer += res_ln[i] * rp[i];
      answer += res_nxt[i] * prv[i];
      answer += res_ln_nxt[i]; 
    }
  }
  {
    for (int r = 0; r < 2; r++) {
      cdq_solver CDQ(N + 8), CDQ_ln(N + 8), CDQ_nxt(N + 8), CDQ_ln_nxt(N + 8);
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? rn[i] : -1;
        CDQ.add_point(ln[i], nxt[i], rn[i], i, coeff);
        CDQ_ln.add_point(ln[i], nxt[i], rn[i], i, -1LL * ln[i] * coeff);
        CDQ_nxt.add_point(ln[i], nxt[i], rn[i], i, -1LL * nxt[i] * coeff);
        CDQ_ln_nxt.add_point(ln[i], nxt[i], rn[i], i, 1LL * ln[i] * nxt[i] * coeff);
      }
      for (int i = 0; i < N; i++) {
        CDQ.add_query(prv[i], rp[i], i - 1, lp[i] - 1);
        CDQ_ln.add_query(prv[i], rp[i], i - 1, lp[i] - 1);
        CDQ_nxt.add_query(prv[i], rp[i], i - 1, lp[i] - 1);
        CDQ_ln_nxt.add_query(prv[i], rp[i], i - 1, lp[i] - 1);
      }
      auto res = CDQ.solve();
      auto res_ln = CDQ_ln.solve();
      auto res_nxt = CDQ_nxt.solve();
      auto res_ln_nxt = CDQ_ln_nxt.solve();
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? 1 : lp[i];
        answer += res[i] * prv[i] * rp[i] * coeff;
        answer += res_ln[i] * rp[i] * coeff;
        answer += res_nxt[i] * prv[i] * coeff;
        answer += res_ln_nxt[i] * coeff; 
      }
    }
    for (int r = 0; r < 2; r++) {
      cdq_solver CDQ(N + 8), CDQ_ln(N + 8), CDQ_nxt(N + 8), CDQ_ln_nxt(N + 8);
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? rn[i] : -1;
        CDQ.add_point(ln[i], nxt[i], rn[i], i, coeff);
        CDQ_ln.add_point(ln[i], nxt[i], rn[i], i, -1LL * ln[i] * coeff);
        CDQ_nxt.add_point(ln[i], nxt[i], rn[i], i, -1LL * nxt[i] * coeff);
        CDQ_ln_nxt.add_point(ln[i], nxt[i], rn[i], i, 1LL * ln[i] * nxt[i] * coeff);
      }
      for (int i = 0; i < N; i++) {
        CDQ.add_query(prv[i], rp[i], lp[i] - 1, lp[i] - 1);
        CDQ_ln.add_query(prv[i], rp[i], lp[i] - 1, lp[i] - 1);
        CDQ_nxt.add_query(prv[i], rp[i], lp[i] - 1, lp[i] - 1);
        CDQ_ln_nxt.add_query(prv[i], rp[i], lp[i] - 1, lp[i] - 1);
      }
      auto res = CDQ.solve();
      auto res_ln = CDQ_ln.solve();
      auto res_nxt = CDQ_nxt.solve();
      auto res_ln_nxt = CDQ_ln_nxt.solve();
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? -1 : -lp[i];
        answer += res[i] * prv[i] * rp[i] * coeff;
        answer += res_ln[i] * rp[i] * coeff;
        answer += res_nxt[i] * prv[i] * coeff;
        answer += res_ln_nxt[i] * coeff; 
      }
    }
  }
  {
    for (int r = 0; r < 2; r++) {
      cdq_solver CDQ(N + 8), CDQ_ln(N + 8), CDQ_nxt(N + 8), CDQ_ln_nxt(N + 8);
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? 1 : -i;
        CDQ.add_point(ln[i], nxt[i], -rn[i], -i, coeff);
        CDQ_ln.add_point(ln[i], nxt[i], -rn[i], -i, -1LL * ln[i] * coeff);
        CDQ_nxt.add_point(ln[i], nxt[i], -rn[i], -i, -1LL * nxt[i] * coeff);
        CDQ_ln_nxt.add_point(ln[i], nxt[i], -rn[i], -i, 1LL * ln[i] * nxt[i] * coeff);
      }
      for (int i = 0; i < N; i++) {
        CDQ.add_query(prv[i], rp[i], -i, -lp[i]);
        CDQ_ln.add_query(prv[i], rp[i], -i, -lp[i]);
        CDQ_nxt.add_query(prv[i], rp[i], -i, -lp[i]);
        CDQ_ln_nxt.add_query(prv[i], rp[i], -i, -lp[i]);
      }
      auto res = CDQ.solve();
      auto res_ln = CDQ_ln.solve();
      auto res_nxt = CDQ_nxt.solve();
      auto res_ln_nxt = CDQ_ln_nxt.solve();
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? i : 1;
        answer += res[i] * prv[i] * rp[i] * coeff;
        answer += res_ln[i] * rp[i] * coeff;
        answer += res_nxt[i] * prv[i] * coeff;
        answer += res_ln_nxt[i] * coeff; 
      }
    }
    for (int r = 0; r < 2; r++) {
      cdq_solver CDQ(N + 8), CDQ_ln(N + 8), CDQ_nxt(N + 8), CDQ_ln_nxt(N + 8);
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? 1 : -i;
        CDQ.add_point(ln[i], nxt[i], -rn[i], -i, coeff);
        CDQ_ln.add_point(ln[i], nxt[i], -rn[i], -i, -1LL * ln[i] * coeff);
        CDQ_nxt.add_point(ln[i], nxt[i], -rn[i], -i, -1LL * nxt[i] * coeff);
        CDQ_ln_nxt.add_point(ln[i], nxt[i], -rn[i], -i, 1LL * ln[i] * nxt[i] * coeff);
      }
      for (int i = 0; i < N; i++) {
        CDQ.add_query(prv[i], rp[i], -i, -i);
        CDQ_ln.add_query(prv[i], rp[i], -i, -i);
        CDQ_nxt.add_query(prv[i], rp[i], -i, -i);
        CDQ_ln_nxt.add_query(prv[i], rp[i], -i, -i);
      }
      auto res = CDQ.solve();
      auto res_ln = CDQ_ln.solve();
      auto res_nxt = CDQ_nxt.solve();
      auto res_ln_nxt = CDQ_ln_nxt.solve();
      for (int i = 0; i < N; i++) {
        int coeff = r == 0 ? -i : -1;
        answer += res[i] * prv[i] * rp[i] * coeff;
        answer += res_ln[i] * rp[i] * coeff;
        answer += res_nxt[i] * prv[i] * coeff;
        answer += res_ln_nxt[i] * coeff; 
      }
    }
  }
  {
    cdq_solver CDQ(N + 8), CDQ_ln(N + 8), CDQ_nxt(N + 8), CDQ_ln_nxt(N + 8);
    for (int i = 0; i < N; i++) { 
      CDQ.add_point(ln[i], nxt[i], -rn[i], i, 1);
      CDQ_ln.add_point(ln[i], nxt[i], -rn[i], i, -ln[i]);
      CDQ_nxt.add_point(ln[i], nxt[i], -rn[i], i, -nxt[i]);
      CDQ_ln_nxt.add_point(ln[i], nxt[i], -rn[i], i, 1LL * ln[i] * nxt[i]);
    }
    for (int i = 0; i < N; i++) {
      CDQ.add_query(prv[i], rp[i], -i, lp[i] - 1);
      CDQ_ln.add_query(prv[i], rp[i], -i, lp[i] - 1);
      CDQ_nxt.add_query(prv[i], rp[i], -i, lp[i] - 1);
      CDQ_ln_nxt.add_query(prv[i], rp[i], -i, lp[i] - 1);
    }
    auto res = CDQ.solve();
    auto res_ln = CDQ_ln.solve(); 
    auto res_nxt = CDQ_nxt.solve();
    auto res_ln_nxt = CDQ_ln_nxt.solve();
    for (int i = 0; i < N; i++) {
      answer += res[i] * prv[i] * rp[i] * (i - lp[i]);
      answer += res_ln[i] * rp[i] * (i - lp[i]);
      answer += res_nxt[i] * prv[i] * (i - lp[i]);
      answer += res_ln_nxt[i] * (i - lp[i]);
    }
  }
  cout << answer << "\n";
}

← Rev. 6 →

The two inequalities above are equal to $$$D(i,k)=D(j+1,k)=D(i,j)$$$ with D being the number of distinct elements.

At first, I thought 40 looks suspiciously like a log, but I realized it's the number of cases...

I was stuck on the first step because I didn't know about the positions x and y.

(Comparison: An optimized $$$O(n \log(n))$$$ solution runs in ~200ms).

← Rev. 2 →

Is there a section for e.g. CDQ for min/max dp recurrences? I haven't seen it mentioned in the blog.

For k-dimension sum with updates, I believe one can just add a "time" dimension.

I guess I don’t have much to say about this topic. I wanted to add a section about CHT + CDQ, but decided not to because, in my opinion, there are no new ideas beyond what’s already in the blog. Still, I believe there could be interesting problems involving min/max DP recurrences. If you know any such problems, please share them and I’ll probably add a section about them.
This felt too obvious and natural to me, so I didn’t mention it. But I agree that I probably should have.

In short, I wanted this blog to focus more on novel applications, or at least ones I haven’t seen elsewhere.

svorogaze

bump

Блог пользователя tch1cherin

3D Queries

Tasks for practice

Arbitrary-dimensional queries

Tasks for practice

Generalization for min/max

MEX on segment with modifications (offline)

Tasks for practice

Online FFT

Tasks for practice

CDQ + SOS DP

Tasks for practice

CDQ on Graph Trick

Tasks for practice