MEX sequence problem (my own problem) — Try to solve!

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

№	Пользователь	Вклад
1	Qingyu	162
2	adamant	148
3	Um_nik	145
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	132

Problem: MEX Sequence
time limit per test: 2 seconds
memory limit per test: 512 megabytes
You are given an array $$$a$$$ of length $$$n$$$. Define an infinite sequence $$$s$$$ as follows:
For $$$1 \leq i \leq n: s_i = a_i$$$
For $$$i \gt n$$$: $$$s_i$$$ = $$$\operatorname{mex}(s_{i-1}, s_{i-2}, ..., s_{i-n})$$$
You are given $$$q$$$ queries. In each query, you are given an integer $$$k$$$, and you have to determine $$$s_k$$$.
Input
Each test contains multiple test cases. The first line contains a single integer $$$t$$$ $$$(1 \leq t \leq 10^4)$$$, the number of test cases. The description of the test cases follows.
The first line of each test case contains two integers $$$n$$$ and $$$q$$$ $$$(1 \leq n, q \leq 2 \cdot 10^5)$$$, the size of the array and the number of queries, respectively.
The second line of each test case contains $$$n$$$ integers $$$a_1, a_2, ..., a_n$$$ $$$(0 \leq a_i \leq n)$$$ — the elements of the array $$$a$$$.
The third line of each test case contains $$$q$$$ integers $$$k_1, k_2, ..., k_q$$$ $$$(1 \leq k_i \leq 10^{18})$$$, where $$$k_i$$$ is the value of $$$k$$$ for the $$$i$$$-th query.
It's guaranteed that the sum of $$$n$$$ and the sum of $$$q$$$ over all test cases does not exceed $$$2 \cdot 10^5$$$.
Output
For each test case, output a line with $$$q$$$ integers, where the $$$i$$$-th integer is the answer to the $$$i$$$-th query.

Please try to solve the question first before seeing the solution unless you're new!
If you get stuck, I recommend you see this problem from CSES then solve this one.
Feel free to rate the problem from 1 to 10!
Update: Shoutout to BLOBVISGOD to be the first solver of the problem!
Update: The editorial is here!

Solution

Let's do a dry run first to observe the behavior of the sequence. Let $$$a = [1, 2, 0, 2, 2]$$$. We observe that the sequence $$$s$$$ is $$$[1, 2, 0, 2, 2, 3, 1, 4, 0, 5, 2, 3, 1, 4, 0, 5, 2, ...]$$$. We observe that the sequence is periodic from $$$s_{n+1}$$$ onwards with a period of $$$n + 1$$$. We also observe that the periodic part from $$$s_{n+1}$$$ to $$$s_{2n + 1}$$$ is a permutation of integers $$$0, 1, 2, ..., n$$$ (in this case, $$$3, 1, 4, 0, 5, 2$$$). Here's a way to prove it.
We know that all elements in the sequence are bounded by $$$n$$$ because the first $$$n$$$ elements indeed satisfy according to the problem constraints, and the mex of any $$$n$$$ non-negative integers can be at most $$$n$$$. Obviously, all elements in the sequence are non-negative.
Finally, we have to prove that all elements in $$$s_{n+1}, s_{n+2}, ..., s_{2n+1}$$$ are distinct.
For the sake of contradiction, assume two indices $$$i$$$ and $$$j$$$ exist such that $$$n + 1 \leq i \lt j \leq 2n + 1$$$ and $$$s_i$$$ = $$$s_j$$$. Now, $$$s_j = \operatorname{mex}(s_{j-1}, s_{j-2}, ..., s_{j-n})$$$. Now, $$$s_i$$$ lies in $$$s_{j-1}, s_{j-2}, ..., s_{j-n}$$$ because $$$j \leq i + n$$$ however $$$s_i$$$ = $$$s_j$$$ and since $$$s_j$$$ is the mex of integers containing $$$s_i$$$, we get a contradiction. Hence, all integers in $$$s_{n+1}, s_{n+2}, ..., s_{2n + 1}$$$ are distinct and between $$$0$$$ and $$$n$$$ inclusive. Hence, all elements from $$$s_{n+1}$$$ and $$$s_{2n+1}$$$ form a permutation of $$$0, 1, 2, ..., n$$$. Now we have to prove that the sequence is periodic from term $$$n+1$$$ onwards with a period of $$$n+1$$$. This means for all $$$i \geq 2n + 2$$$, $$$s_i$$$ = $$$s_{i-(n+1)}$$$. We know that $$$s_{2n+2} = \operatorname{mex}(s_{2n+1}, s_{2n}, ..., s_{n+2})$$$. We know that all these $$$n$$$ integers are distinct and do not contain $$$s_{n+1}$$$. Hence, the mex is $$$s_{n+1}$$$ and $$$s_{2n+2}$$$ = $$$s_{n+1}$$$ Hence, all integers in $$$s_{n+2}, s_{n+3}, ..., s_{2n+2}$$$ are distinct because $$$s_{n+1}, s_{n+2}, ..., s_{2n+1}$$$ are distinct. Hence, by induction, it can be proven that any $$$n+1$$$ consecutive terms are distinct. This can also prove that any pair of elements from whose lower term starts from $$$n+1$$$ onwards that differ by $$$n+1$$$ are equal. Hence, the sequence is periodic from term $$$n+1$$$ onwards. Now, we have to calculate $$$s_{n+1}, s_{n+2}, ..., s_{2n+1}$$$. A brute force approach — calculating the mex of each element manually by taking the last $$$n$$$ elements takes $$$O(n)$$$ per computation for overall $$$O(n^2)$$$ time which is too slow. The problem now becomes similar to the sliding window mex problem where the array size is $$$2n + 1$$$ and the window size is $$$n$$$. We'll use a set called missing to keep track of the missing elements in the current sliding window. We'll erase the possibly missing element that comes inside the window. What about the one that leaves the window? We'll need to use a frequency array for this. Let $$$freq_i$$$ be the frequency of element $$$i$$$ in the current window for $$$0 \leq i \leq n$$$. Now, we'll increment the frequency of the element coming inside the window and decrement the frequency of the element leaving the window. If the frequency of the element leaving the window becomes $$$0$$$, that element is no longer present in the window, and we can insert it back into the missing elements set. Each time, we have the current element = smallest missing element in the set, that is *missing.begin(); because sets are always sorted. Hence, we can calculate $$$s_{n+1}, s_{n+2}, ..., s_{2n+1}$$$ in $$$O(\log n)$$$ time, for an overall time complexity of $$$O(n \log n)$$$ Denote $$$perm_i$$$ = $$$s_{n+i}$$$ for all $$$1 \leq i \leq n + 1$$$. Now, what is the value of $$$s_k$$$ for some value of $$$k$$$? If $$$k \leq n$$$, the answer is simply $$$a_k$$$. Else, we know that the index of the answer in the periodic part is $$$(k \mod {n + 1}) + 1$$$. For example, if $$$k = n + 2$$$, the index will be $$$2$$$. This repeats every $$$n + 1$$$ terms. Hence, the answer is $$$perm_{(k\mod{n + 1}) + 1}$$$. Each query takes $$$O(1)$$$ time. Hence, the overall time complexity of the code will be $$$O(n \log n + q$$$).

Code

#include <bits/stdc++.h>
using namespace std;
void solve() {
    int n, q;
    cin >> n >> q;
    vector<int> a(n + 1);
    for(int i = 1; i <= n; i++) cin >> a[i];
    vector<int> freq(n + 1, 0);
    set<int> missing;
    for(int i = 1; i <= n; i++) freq[a[i]]++;
    for(int i = 0; i <= n; i++) {
        if(freq[i] == 0) missing.insert(i);
    }
    vector<int> perm(n + 2);
    for(int i = 1; i <= n + 1; i++) {
        perm[i] = *missing.begin();
        missing.erase(perm[i]);
        freq[perm[i]]++;
        if(i <= n) {
            freq[a[i]]--;
            if(freq[a[i]] == 0) missing.insert(a[i]);
        }
    }
    for(int i = 0; i < q; i++) {
        long long k;
        cin >> k;
        if(k <= n) cout << a[k];
        else {
            int idx = (k % (n + 1)) + 1;
            cout << perm[idx];
        }
        if(i == q - 1) cout << endl;
        else cout << ' ';
    }
}
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);
    int t;
    cin >> t;
    while(t--) solve();
    return 0;
}

#include "bits/stdc++.h" using namespace std; #define rep(i,a,b) for(int i=(a); i<(b); ++i) typedef long long ll; typedef vector<int> vi; void solve(){ int n,q; cin >> n >> q; vi a(n), cnt(n+1), perm(n+1); for(auto& c : a) cin >> c; set<int> missing; rep(i,0,n) if (a[i]<=n) cnt[a[i]]++; rep(i,0,n+1) if (cnt[i]<1) missing.insert(i); rep(i,0,n+1){ perm[i] = *begin(missing); missing.erase(perm[i]); cnt[perm[i]]++; if (i<n) { cnt[a[i]]--; if (cnt[a[i]]<1) missing.insert(a[i]); } } while(q--){ ll x; cin >> x; --x; if (x<n) cout << a[x] << '\n'; else cout << perm[(x-n)%(n+1)] << '\n'; } } int main(){ cin.tie(NULL),cin.sync_with_stdio(false); int t; cin >> t; while(t--) solve(); }

/* Coded by harshcooljn */ #include <bits/stdc++.h> #include <iostream> #include <stdint.h> #include <ios> #include <iomanip> #include <numeric> #include <math.h> #include <vector> #include <stdlib.h> #include <queue> #include <utility> #include <map> #include <set> #include <unordered_set> #include <unordered_map> #include <algorithm> #include <string> using namespace std; typedef long long int ll; typedef long double ld; typedef unsigned long long int ull; typedef pair<ll, ll> pll; #define vll vector<ll> #define vstr vector<string> #define vvll vector<vector<ll> > #define vpll vector<pair<ll,ll> > #define endl '\n' #define vin(v,a,n) for(ll i=a;i<n;i++){cin>>v[i];} #define vinp(v,n) for(ll i=0;i<n;i++){ll x;cin>>x;v.push_back(x);} #define pvec(v) for(auto &e: v){cout << e << " ";}cout<<endl; #define pvecp(v) for (ll i=0;i<v.size();i++){cout << v[i].first << "," << v[i].second << " \n"[i==v.size()-1];} #define parr(a,n) for(ll i=0;i<n;i++){cout << a[i] << "";} #define parr1(a,n) for(ll i=1;i<=n;i++){cout << a[i] << "";} #define yes cout<<"YES"<<endl; #define full(v) v.begin(),v.end() #define fr(i,a,n) for(ll i=a;i<n;i++) #define fr_(i,n,a) for (ll i=n;i>=a;i--) #define no cout<<"NO"<<endl; #define spc ' ' #define nline cout<<endl; #define fileIO freopen("input.txt","r",stdin);freopen("output.txt","w",stdout) #define fastIO ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0) void Solve(ll t10) { ll n,q; cin >> n >> q; vll a(n+1); set<ll> st; fr(i,0,n+1){ st.insert(i); } fr(i,0,n){ cin >> a[i]; st.erase(a[i]); } a[n] = *st.begin(); // this is the mex of elements from [0,n-1] (0 - based index) while (q--){ ll k; cin >> k; k--; cout << a[k%(n+1)] << " "; } cout << endl; } int main() { fastIO; // fileIO; ll t;cin >> t;fr(c, 1, t + 1) Solve(c); // Solve(1); return 0; }

Комментарии (30)

Написать комментарий?

D1Haterr

12 месяцев назад, скрыть # |

If k>2N then its easy to solve it with a formula otherwise you can just bruteforce first 2N elements before the queries and output a[K]

→ Ответить

shivankjha46

12 месяцев назад, скрыть # ^ |

No bro, brute forcing to finding the mex till $$$2n$$$ elements takes $$$O(n^2)$$$ time because each computation can take $$$O(n)$$$. You need to use some data structure to speed this up to $$$O(log n)$$$ or $$$O(1)$$$ per computation.

BLOBVISGOD

How does your "easy" formula work, and how do you brute force efficiently (n=1e5)?

My solution is as follows: First observe that s_{i+1},...,s_{i+n} cannot be equal to s_{i}, and furthermore for i>n, we have s_{i} <= n, since it is the mex of n numbers. Hence for i>n, the sequence is periodic with period n+1, and s_{n+1},...,s_{2n+2} are a permutation of 0,...,n.

To find this permutation, we can use an std::set that stores the numbers from 0,...,n that do not occur in our 'current' window of n elements. we can find the MEX by doing *set.begin(), and update the set by adding numbers s_{i-n} to it if it is the last occurrence of s_{i-n} in the interval, and removing s_{i} from the set, since after the interval moves one place to the right it will contain s_{i}.

I am not sure if there is an easier way? what is your solution?

Use ordered set and binary search for the first value i such that order_of_key(i) doesnt equal i and then you know the mex is i. Now you continue this until all values from 0-N are filled then you will notice that the Mex will just increase by one each time

← Rev. 2 →

that sounds significantly more annoying to implement, and is slower (O(n log(n)^2) instead of O(n log(n))

I think you can optimize it and use dsu instead of ordered set where you draw an edge to the next element that didnt appear yet and once u add an element i just draw an edge between that and find(i+1).

This will be O(N*alpha(n))

You also need to delete numbers, I am not sure how DSU can handle that. Can you maybe implement it?

I thought you get the MEX of everything before i not just the N numbers before it

RealAsadullo

what is bro yapping about :skull:

Technically, if you want a ""faster"" solution, you can use https://en.wikipedia.org/wiki/Van_Emde_Boas_tree . Then it is O(n loglog(n)) :)

parthkamal_iitk

9 месяцев назад, скрыть # ^ |

can you share some resources , where I can read that ?

temp-for-talk

Hi! is this your chess profile?

Of course!

Come on! Please give a code solution quickly. I'll only count that. Hopefully you give one in C++ so that I can understand your code since I don't do Python and other languages.

aight, calm down:

Either I understood the task wrong or your code is completely wrong because every testcase I tried your code didnt work on

Bro the answer is $$$0$$$, sorry.
The sequence is $$$1, 2, 3, 0, 2, 4, 1, 5, 3, 0, 2, 4, 1, 5, 3, 0$$$
$$$16$$$th element is $$$0$$$.
Code output is also 0.

KingOfYellowAndBlack

He thought u take the mex of all the elements before I not just the N elements before it

harshcooljn

Following is my approach: Simply find the MEX of the initial array, and that will be the element at the (n)th index (0-based) , these elements keep repeating, therefore element at index k is the same as the element at index k%(n+1)

UPD : I assumed that the elements were distinct , my bad

that does not work if the elements a_1,...,a_n are not unique, i.e. a_i=a_j for i,j <= n.

Yes I assumed they were distinct, my bad

Your code fails for the following test case:

Your output is 2 but expected is 5 because the sequence is:

1 2 0 2 2 (3 1 4 0 5 2 repeated)

This means that the answer is the 5th term of the repetition = (k — n) % (n + 1) = 312779 % 6 = 5. Hence, the answer is the 5th term of the repeating part which is 3 1 4 0 5 2 which is 5. The correct code was written by BLOBVISGOD which outputs 5 for this case.

toberu

I have a solution that solves for arbitrarily large $$$A_i$$$, of course if $$$A_i$$$ is like $$$\le 10^{100}$$$ then we got to do bignum calculation and it would be really annoying. If $$$k \le n$$$ we simple output $$$A_n$$$. Otherwise, WLOG, supposed $$$A$$$ is sorted, then $$$S_{n + 1 + [0, A_1)}$$$ would be $$$[0, A_1)$$$, and $$$S_{n + [A_1, A_2 - 1)}$$$ would be $$$[A_1 + 1, A_2)$$$, and so on.

Bro if $$$a_i \gt n$$$ it just doesn't matter since the MEX is at most $$$n$$$. If the value $$$k$$$ of the query is at most $$$n$$$ just output the value itself else take all values greater than $$$n$$$ to be $$$n$$$ and calculate the MEX normally like the solution.

Also, you will lose generality because the sequence values depend on the last $$$n$$$ terms which may not necessarily be sorted. Thus, the WLOG part is also wrong according to me.

Oh sorry, I didn't read the statement carefully. It says $$$S_i$$$ is the MEX of last $$$n$$$ terms, not all the previous terms. Otherwise my solution would be correct. Sorry about that man. If so then we can just maintain a set to track the missing element from $$$0$$$ to $$$n - 1$$$, and compute the smallest one for each element from $$$n + 1$$$ to $$$2n$$$.

avengers2405

you are 12 years old ?!!!

Yes, that is right!

More specifically, I was born on 17 February 2013.

Cool. keep going bro, please don't get distracted in future. Good luck!

Блог пользователя shivankjha46