Recently, I got a request asking me to write down my thought process while solving questions. So, here is the promised blog.
I would like to thank IceKnight1093, qwexd, Sana, Everule and trash-can for proof reading and suggesting edits in the blog. Special thanks to satyam343 for discussing most of the blog with me.
1. Overview
The blog contains my solutions to $$$7$$$ problems in a wide range of ratings, starting from $$$1200$$$ all the way upto $$$2700$$$. Each problem has a step-by-step solution and you can notice how there are no large jumps in logic, but everything comes naturally. I do not claim that this is always possible in each problem, however I solve majority of CF problems in such a manner.
There are certainly other high rated people who will have completely different methods of solving. However, this is about what works for me. There are some meta ideas taken from parts of the solution and listed at the end in a "What can we learn?" section. I hope the blog is useful for you.
2. Some General Question Solving Strategies
Here is a non-exhaustive list of techniques that are useful to keep in mind. These are internalized by most experienced participants already but maybe it helps you.
Figuring out the Nature of an Optimal Solution. One of the most important thing in problems imo.
Solving subtasks of the original problem and then trying to extend/generalize your solution.
Fixing a parameter and then trying to maximise the result with respect to that fixed parameter.
Finding necessary and sufficient conditions. Sometimes, your necessary conditions themselves become sufficient, or vice versa.
Identifying Lower and Upper bounds, and constructing them.
Reducing a problem into smaller subproblems, commonly used with Dynamic Programming or Inductive Proofs.
Considering the Decision Version of the Problem instead. Especially useful in Counting problems, where you need to count number of good subsequences for example.
Formalizing the Problem
3. Questions and Solutions Section
Question 1. 1979C - Earning on Bets
Step 1First, let's formalize the statement.
Suppose $$$a_i$$$ is the amount you bet on the $$$i$$$-th outcome. let $$$S = \sum a_i$$$.
You are asked whether there exists an array $$$a$$$ such that $$$a_i \ge 0$$$ and $$$(a_i \cdot k_i) - S \gt 0$$$ for all $$$i$$$.
Step 2Assuming the solution is possible, we want $$$a_i \cdot k_i \gt S$$$ to hold.
Forget about $$$a_i$$$ needing to be integer and let's fix $$$min(a_i \cdot k_i) = c$$$, and we want to minimize $$$S$$$.
Naturally, the optimal choice is to have each
$$$a_i = \frac{c}{k_i}$$$$$$S$$$ can be computed as $$$\sum \frac{c}{k_i}$$$. This solution works when $$$S \lt c$$$, i.e. $$$\sum \frac{1}{k_i} \lt 1$$$. It is not too hard to see that when $$$\sum \frac{1}{k_i} \ge 1$$$, the solution is impossible. A final detail is we can choose $$$c = LCM(k_i)$$$ and this guarantees that all $$$a_i$$$ is integer as required by the problem
For completeness, here is a simple proof of the impossibility :
Rewritten Condition is
$$$a_i \gt \frac{S}{k_i}$$$Additing the equations up over all $$$i$$$,
$$$S \gt S (\sum \frac{1}{k_i})$$$giving us our desired result
What can we learn? - Take a small amount of time to formalize the conditions. What is the problem exactly asking for?
- Don't be afraid of a little bit of algebra.
- Rephrasing the problem slightly might lead you to an obvious solution, for example just fixing $$$min(a_i \cdot k_i)$$$ as above, and then asking for minimum $$$S$$$.
- How do we motivate ourselves to fix $$$min(a_i \cdot k_i)$$$ though? Well, look at the equation. It is $$$(a_i \cdot k_i) \gt S$$$, This implies $$$min(a_i \cdot k_i) \gt S$$$. We can fix either $$$S$$$ or $$$min(a_i \cdot k_i)$$$ and then maximize or minimize the other quantity respectively, to have the largest chance of satisfying the given inequality.
Question 2. 1987D - World is Mine
General IdeaIn a lot of asymmetric game questions, it is useful to consider Alice and Bob's strategies separately.
For example, here Alice's strategy is pretty trivial. Once you have identified a simple strategy, it also becomes easier to analyze the opponent's strategy.
Step 1At each step, Alice simply takes the element with smallest value of $$$a_i$$$ which doesn't break her sorted condition.
This is pretty obvious as it doesn't worsen any of her future choices. You can formally prove it with Exchange Argument. Note that while I pass it off as something trivial, if you do not understand Exchange Argument, OR do not immediately see how to prove this with Exchange Arguments, I highly recommend you to check up on it. It will help you understand what greedy algorithms are logical and what are baseless.
Step 2Let's now analyze Bob's moves. Bob will want to take all occurrences of some element $$$x$$$ himself before Alice reaches there so as to deny Alice that element $$$x$$$.
Here is a very important pitfall. You might think can't Alice play against Bob's strategy specifically? No, as we discussed before, Alice's strategy is fixed.
For the unconvinced, here is a reasoning you can use : If she tried to take $$$x$$$ herself when she sees Bob taking them, she is unable to take some integer $$$y \lt x$$$. However, you should be convinced of the claim without needing such reasoning, Alice's optimal move at each step should only be dependent on the current setup and not Bob's previous moves.
Let's remove all elements with $$$freq_i = 0$$$ for convenience, and renumber the remaining elements into $$$1...k$$$. Now, consider the case when Bob wants to deny the sequence $$$x_1, x_2, ..., x_m$$$ from Alice. What is Bob's strategy and when is it possible? Both are very important questions.
Step 3Reminder that we have fixed the sequence $$$x_1, x_2, ..., x_m$$$ Bob wants to remove, and we will be discussing optimal strategy wrt that.
Claim 1 : Bob will remove the elements in sorted order, i.e. $$$x_i \lt x_{i + 1}$$$.
This is not hard to see since Alice will approach $$$x_1$$$ first, then $$$x_2$$$, and so on. Thus, if Bob wants to be able to delete these elements, his best chance is removing them in the same order.
Now, we know Bob's strategy too. Let's look at when he can successfully deny Alice all those elements.
For the first element, $$$freq_{x_1} \lt x_1$$$ must hold, otherwise Alice reaches $$$x_1$$$ too early (reminder : we have renumbered the elements present in the array to always be $$$1...k$$$ to make these formulae easier)
Similarly, for the second element $$$freq_{x_2} + freq_{x_1} \lt x_2 - 1$$$ must hold, and in general $$$freq_{x_1} + freq_{x_2} + ... + freq_{x_i} \lt (x_i - i + 1)$$$ must hold. This is pretty obvious once we know Alice and Bob's strategy.
Step 4Ok, so we know what sequences are removable. Now, the dynamic programming becomes sort of obvious.
Let's look at what information we need to store. Consider the subproblem till $$$i - 1$$$ and we want to check if we can add $$$i$$$ to Bob's sequence $$$x$$$ or not. We need $$$2$$$ pieces of information :
- $$$\sum freq_{x}$$$ till now
- $$$|x|$$$ till now
Naively, this is still $$$O(N^3)$$$ states but we can incorporate either of the 2 additional parameters into the $$$dp$$$ state. For example, $$$dp_{(i, j)} = $$$ the minimum value of $$$\sum freq_{x}$$$ considering the first $$$i$$$ elements and $$$|x| = j$$$. (You can do it the other way too, for example $$$dp_{(i, j)} =$$$ the maximum value of $$$|x|$$$ under the constraint that $$$\sum freq_{x}$$$ is $$$j$$$).
If you understood the question, transitions are pretty simple to come up with and are left as an exercise.
What can we learn? - The general idea of separately considering Alice and Bob's strategy is pretty useful.
- When we are asked to find a optimal "possible" sequence, it helps to change the problem into a decision problem first. Writing a dp afterwards, becomes much easier.
- Break the problem into several small parts : for example, here i did $$$4$$$ main things : Alice's strategy, Bob's strategy, decision problem of a sequence $$$x$$$, finally the dynamic programming. If you guess one of the steps, be careful as the entire solution from the start might become wrong due to an error in one step. It helps to carefully analyze each step instead. Making small motivated steps like these helps immensely.
- In dynamic programming problems, the method I use isn't to just randomly guess the correct states, but rather looking at what information we need to store which is sufficient to calculate the answer.
- Considering suboptimal solutions and then speeding them up is very important. Here, specifically i used a dp-specific trick where i went from $$$dp_{(i, j, k)} = $$$ is state $$$(i, j, k)$$$ possible? to $$$dp_{(i, j)} = $$$ min/max $$$k$$$ for a possible state.
Similiar ProblemsAll these problems authored by me
Question 3. 1977C - Nikita and LCM
Step 1We want to find the length of the longest special subsequence of $$$a$$$. A very natural thought is to first check if it is $$$n$$$ which is the whole array itself.
Let's sort the array. Now, we can easily verify the only case when the answer is not $$$n$$$ is when $$$a_i | a_n$$$ ($$$a_i$$$ divides $$$a_n$$$) for all $$$i$$$. Otherwise, the lcm exceeds $$$a_n$$$ and hence doesn't belong to the array.
This powerful idea highly restricts the nature of "interesting" testcases.
Step 2So, assume the answer is not $$$n$$$ now.
Let's look at the divisors of $$$a_n$$$. LCM of any subsequence of $$$a$$$ must be a divisor of $$$a_n$$$ naturally. We know that the number of divisors isn't too large even for a number of the order $$$10^9$$$ (we can prove it is definitely $$$\le 2 \cdot \sqrt{10^9}$$$ but in practise it is much lower, somewhere around $$$2000$$$ probably). Since the number of divisors isn't large, we can try them all.
Let's try to fix the LCM of the subsequence to be $$$x$$$ and then calculate longest sequence. Only try $$$x$$$ such that $$$x$$$ doesn't belong to $$$a$$$ as given in the problem. So, we want to find the longest subsequence $$$b$$$ of $$$a$$$ such that and $$$LCM(b_i) = x$$$.
Obviously, we can simply take all those $$$a_i$$$ which are divisors of $$$x$$$. This guarantees that LCM will be a divisor of $$$x$$$, and this is the longest subsequence since we cannot take non-divisors of $$$x$$$. If even this subsequence fails to have LCM $$$x$$$, it is impossible to obtain it for any subsequence.
Bruteforcing on the LCM solves the problem in $$$O(n \cdot d(A))$$$ where $$$d(A)$$$ denotes the maximum number of divisors of any number $$$\le A (= 10^9)$$$
What can we learn? - Eliminate obvious cases (for example when the answer is $$$n$$$ above). These might even simplify the problem by constraining how the non-obvious tests look like.
- Fixing a parameter and then calculating the answer for that value of the parameter is a useful technique.
Question 4. 1889B - Doremy's Connecting Plan
Step 1Define $$$B_i$$$ as the sum of $$$a_j$$$ where $$$j$$$ belongs to the same component as $$$i$$$.
The given equation is $$$(B_i + B_j) \ge (i \cdot j \cdot c)$$$
Considering all edges seems hard. So, we should try to first consider some special edges. What edges should we try first? Well naturally, an idea is to try those edges with a small value of $$$i \cdot j \cdot c$$$.
Step 2Here, you will need to make a claim that you only need edges of the form $$$(1, i)$$$. More specifically, if it is possible to make an edge between $$$(i, j)$$$, it is also possible to make an edge between either $$$(1, i)$$$ or $$$(1, j)$$$. The intuition is there in the previous step that we want to consider edges with the least value of $$$i \cdot j \cdot c$$$, and you must be able to make this small jump.
Proving it is quite simple :
If $$$B_i \ge (i \cdot c)$$$, or $$$B_j \ge (j \cdot c)$$$, we are done.
Otherwise,
$$$(B_i + B_j) \lt (i + j) \cdot c \le (i \cdot j \cdot c)$$$for all $$$i, j \gt 1$$$
Hence, it would be impossible to make an edge $$$(i, j)$$$ if you cannot make edges $$$(1, i)$$$ and $$$(1, j)$$$.
Step 3Since, we are only considering edges $$$(1, i)$$$, we want to essentially find an ordering of the $$$n - 1$$$ edges $$$(1, i)$$$ for all $$$i \ge 2$$$ where we are able to add edges in that order.
The condition becomes
$$$(B_1 + a_i) \ge (i \cdot c)$$$at the time of adding edge $$$(1, i)$$$, i.e.
$$$B_1 \ge (i \cdot c) - a_i$$$Let's define a new quantity $$$d_i = (i \cdot c) - a_i$$$. Then, if $$$(1, i)$$$ edge is possible right now, and $$$d_j \lt d_i$$$, then $$$(1, j)$$$ edge is also possible.
This gives us the insight to add edges in increasing order of $$$d_i$$$ since $$$B_1$$$ is an only increasing quantity. Thus, sort the edges $$$(1, i)$$$ according to $$$d_i$$$ and maintain $$$B_1$$$ and check whether each edge is possible or not.
What can we learn? - The most important thing is the motivation behind making claims like we only need edges $$$(1, i)$$$. It might seem like a big jump in logic, but I do not believe it so. It feels very natural to me that we want some equation to hold, we should try to minimize one of the quantities independently of the other. Since we have to make at least one edge $$$(x, i)$$$ for all $$$i$$$, we should consider the case when $$$(x \cdot i \cdot c)$$$ is minimized i.e. $$$x = 1$$$. Proving it is quite easy after we make the claim
- Try to use arguments about the Nature of the Optimal Solution. For example, we used it twice here. First to see that we only need edges $$$(1, i)$$$. Then, to see that we should consider edges in increasing order of $$$d_i = (i \cdot c) - a_i$$$. $$$d_i$$$ is not a randomly plucked quantity, we manipulated some equations and found out $$$B_i \ge d_i$$$ is the condition for adding edge $$$(1, i)$$$. (This is called Separation of Variables fyi, where you split the equation into $$$2$$$ parts, the $$$2$$$ being independent of each other, so that only either the LHS or RHS is dependent on $$$i$$$ while the other part is independent of it)
- Again, formalize(here, simplify) the statement and don't be afraid to do a little algebra. Those are essential skills.
Question 5. 2002D1 - DFS Checker (Easy Version)
Step 1First, let us understand when a permutation is a valid DFS order.
The permutation must begin with $$$1$$$ obviously. After that, the next element must be $$$2$$$ or $$$3$$$. Assuming it is $$$2$$$, we will go on to visit the subtree of $$$2$$$ before coming back up and then going to visit subtree of $$$3$$$.
In general, after visiting a non-leaf node $$$x$$$, we will visit the entire subtree of one of the children of $$$x$$$, and after visiting the entire subtree of its child, we will go on to visit the other child of $$$x$$$.
This gives us the following conditon :
- Condition $$$x$$$ : Let $$$p_1$$$ be the position of $$$x$$$ a non-leaf node, and $$$p_2$$$, $$$p_3$$$ be the positions of the children of $$$x$$$ with $$$p_2 \lt p_3$$$. Let $$$s$$$ be the subtree size of the child of $$$x$$$. Then, $$$p_2 = p_1 + 1$$$ and $$$p_3 = p_2 + s$$$ must hold.
It is not too hard to see that if the above condition holds for all $$$x$$$, the permutation is a valid DFS order.
Step 2We can now solve the problem in $$$O(N \cdot Q)$$$ but we need something faster.
The important note is that our permutation doesn't change significantly throughout operations. Only $$$2$$$ elements gets swapped, so not too many things should be affected.
Let's define $$$bad_x = 1$$$ if Condition $$$x$$$ is false, and $$$bad_x = 0$$$ otherwise. Then, the answer is Yes iff $$$\sum bad_x = 0$$$.
When we swap $$$p_x = b$$$ and $$$p_y = c$$$, we can note that only $$$bad_{b}$$$, $$$bad_{c}$$$, $$$bad_{a_b}$$$, $$$bad_{a_c}$$$ gets affected, as $$$bad_x$$$ only depends on the relative positions of $$$x$$$ and it's children ($$$a_i$$$ is the parent of $$$i$$$).
Thus, we only need to recalculate these values, and this can be easily done in $$$O(1)$$$ time.
Additional NoteThis solution generalizes to D2, albeit with much more implementation details.
Also, be careful while implementing this solution. I made a small bug during the contest due to which I skipped to just implementing D2 instead of wasting more time since I had it mindsolved while implementing D1.
The bug is you need to specifically consider only distinct elements in $$$(b, c, a_b, a_c)$$$. Otherwise, it might happen that you add/subtract $$$bad_x$$$ twice and thus get an incorrect result.
What can we learn? - Try to solve the problem in simpler settings, for example allowing yourself $$$O(N)$$$ time to check the permutation each time. All solutions to this do not generalize however, you can understand what solutions to this easier problem can be adapted to solve the full problem only through experience.
- One tip about extending solutions to simpler versions to queries is that : consider mathematical conditions instead of "processes". For example, i considered conditions on the positions of $$$x$$$ and it's $$$2$$$ children instead of say using DFS itself to check the permutation. Those solutions are usually not generalizable.
- Once we have the condition, it becomes much easier to solve the queries. Usually, it is a matter of being "Chinese" enough, and knowing sufficient tricks.
- The concept of a update not changing much (for example, here only $$$O(1)$$$ items) is very prevalent in a lot of query problems.
Question 6. 1930E - 2..3...4.... Wonderful! Wonderful!
Step 1We first convert the problem to it's decision variant.
Given $$$n$$$ and $$$k$$$ and a subsequence $$$s$$$ of $$$[1, 2, ... n]$$$, we need to tell if it's achievable.
In each operation, we choose an element $$$x$$$ in $$$s$$$ (call it the 'pivot'), and then $$$k$$$ elements smaller than $$$x$$$ and $$$k$$$ elements larger than $$$x$$$ and delete the $$$2 \cdot k$$$ elements.
I made a mistake in contest because I forgot that elements not part of the final sequence may have acted as pivots too.
Lets write down few necessary conditions :
$$$n - |S|$$$ must be divisible by $$$2 \cdot k$$$, since we delete $$$2 \cdot k$$$ elements each time
There must exist at least one element part of $$$s$$$ which can act as a "pivot" for at least one operation. This is because of the nature of the operation. We must have a last operation where the pivot is chosen among the elements of the sequence $$$s$$$. Minor exception : $$$s = [1, 2, .. n]$$$ because we do not perform any operations in this case. More formally, there exists $$$x$$$ in $$$s$$$ such that :
- There are at least $$$k$$$ elements smaller than $$$x$$$ and not in $$$s$$$
- There are at least $$$k$$$ elements larger than $$$x$$$ and not in $$$s$$$
Step 2Consider only sequences that satisfy the above $$$2$$$ conditions. So, let's assume that a possible pivot of the last operation is $$$i$$$ and there are $$$x$$$ larger elements to be deleted (i.e. not part of $$$s$$$), and $$$y$$$ smaller elements. Let $$$X$$$ and $$$Y$$$ denote the respective sets of larger and smaller elements which haven't been deleted yet (but need to be deleted)
Because $$$i$$$ can act as a pivot $$$x, y \ge k$$$. Note that $$$(x + y)$$$ is a multiple of $$$2 \cdot k$$$.
If $$$x + y = 2 \cdot k$$$, we are done, because $$$x = y = k$$$, and we can choose subsequence = pivot $$$i$$$ + all remaining elements to be deleted.
Otherwise, without loss of generality, let's assume $$$x \le y$$$.
Case 1 : $$$x \ge 2 \cdot k$$$ : Choose subsequence = pivot $$$i$$$ + arbitrary $$$k$$$ elements of $$$X$$$ + arbitrary $$$k$$$ elements of $$$Y$$$. This will reduce each of $$$x$$$ and $$$y$$$ by $$$k$$$.
Case 2 : $$$x \lt 2 \cdot k$$$ : This is my solution and not necessarily the most optimum one.
Case 2.1 : $$$(x + y) \ge (6 \cdot k)$$$ : This means $$$y \gt 4 \cdot k$$$. Choose any $$$2 \cdot k + 1$$$ elements of $$$Y$$$ and do operation, thus reducing $$$y$$$ by $$$2 \cdot k$$$.
Case 2.2 : $$$(x + y) = 4 \cdot k$$$ : Take $$$(x - k)$$$ elements of $$$X$$$, and $$$(y - k + 1)$$$ elements of $$$Y$$$ (which adds to exactly $$$2 \cdot k + 1$$$). Then, both $$$x$$$ and $$$y$$$ become $$$k$$$.
Every pair $$$(x, y)$$$ is reduced to a smaller pair which still satisfies $$$x, y \ge k$$$. Thus, by Induction, this necessary condition becomes sufficient as it is possible to achieve every subsequence which satisfies the $$$2$$$ conditions.
Step 3Let $$$T$$$ denote the sequence of elements to be deleted. Note that we will only count sequences $$$T$$$ such that $$$|T| \mod (2 \cdot k) = 0$$$. Let's assume $$$|T| = x$$$.
Counting the number of valid sequences $$$T$$$ directly is hard. So, let's count it's complement — the number of invalid $$$T$$$.
$$$T$$$ is good if there exists at least $$$1$$$ element $$$y$$$ such that :
- $$$y$$$ doesn't belong to $$$T$$$
- There exists $$$\ge k$$$ elements in $$$T$$$ which are $$$ \lt y$$$.
- There exists $$$\ge k$$$ elements in $$$T$$$ which are $$$ \gt y$$$.
$$$T$$$ is bad if there exists no such element $$$y$$$.
What does the bad condition imply? Starting from $$$T_k$$$ till $$$T_{x - k + 1}$$$, the elements must form a continuous subarray, i.e. all elements $$$y$$$ satisfying $$$T_k \le y \le T_{x - k + 1}$$$ belong to $$$T$$$, otherwise they are possible pivots.
let $$$d_i = T_{i + 1} - T_i$$$, then the condition implies $$$d_k = 1, d_{k + 1} = 1 .... d_{x - k} = 1$$$. Define $$$d_0 = T_1 - 1$$$ and $$$d_x = n - T_x$$$. We have the additional conditions that $$$d_i \ge 0, \sum d_i = (n - 1)$$$, and these are all the constraints with respect to the difference array $$$d$$$.
Now, it's a standard Stars and Bars approach to find the number of difference arrays $$$d$$$.
The solution fixes $$$k$$$ and then $$$|T|$$$, but since we only consider $$$|T|$$$ as multiples of $$$2 \cdot k$$$, the solution is $$$O(\sum \frac{n}{2 \cdot i}) = O(n \cdot log(n))$$$
Question 7. 2003E1 - Turtle and Inversions (Easy Version)
Disclaimer : My solution differs from the editorial completely. You can read the editorial for their approach. I will present mine. Try to extend to the hard version of the problem from this.
SubproblemConsider the subproblem when the union of the set of intervals is $$$[1, n]$$$, i.e. for every $$$i (1 \le i \le n)$$$, there exists exactly one $$$j$$$ such that $$$l_j \le i \le r_j$$$.
That is, the set of intervals satisfy :
- $$$l_1 = 1$$$
- $$$l_i = r_{i - 1} + 1$$$
- $$$r_m = n$$$
Subproblem Step 1First, let's rewrite the condition given in the statement.
Let $$$k = max(a_i)$$$. Then, the permutation is divided into $$$2$$$ parts : $$$p_i \le k$$$ part and $$$p_i \gt k$$$ part. Let's assign a $$$0$$$ to all elements satisfying $$$p_i \le k$$$ and a $$$1$$$ to all elements satisfying $$$p_i \gt k$$$.
The condition given in the statement essentially means that every interval $$$(l_i, r_i)$$$, there exists a $$$x_i$$$ such that for all $$$l_i \le j \le x_i$$$, $$$a_j \le k$$$, and for all $$$x_i \lt j \le r_i, a_j \gt k$$$.
Suppose we have a binary sequence $$$a$$$ of $$$0$$$s and $$$1$$$s which satisfies the condition. How do we get the maximum inversions in $$$p$$$ corresponding to the sequence $$$a$$$?
Claim 1 : The number of inversions in the optimal $$$p$$$ corresponding to $$$a$$$ is $$$\frac{n \cdot (n - 1)}{2}$$$ — number of $$$(i, j)$$$ such that $$$i \lt j, a_i = 0, a_j = 1$$$.
Proof 1This is trivially an upper bound, as for all $$$a_i = 0$$$, $$$p_i \le k$$$, and $$$a_j = 1$$$, $$$p_j \gt k$$$, so this pair $$$(i, j)$$$ cannot be an inversion.
We now prove that we can construct this. Note that $$$k$$$ is forced to be the number of $0$s.
- if $$$a_i = 0$$$, assign $$$p_i = \sum\limits_{j \gt i} [a_j = 0] + 1$$$
- if $$$a_i = 1$$$, assign $$$p_i = \sum\limits_{j \gt i} [a_j = 1] + k + 1$$$
You can easily notice that all other pairs become an inversion here.
Subproblem Step 2Just with the previous observation on the maximum number of inversions corresponding to a binary sequence $$$a$$$, we can write a dynamic programming, as in the editorial, and we are done. However, please continue reading as I find my approach cool and it can solve the problem with higher constraints.
Suppose we fixed the rest of the sequence $$$a$$$, and we must decide the partition point within an interval $$$(l_i, r_i)$$$, i.e. we must decide a $$$x$$$ such that $$$1 \le x \lt (r_i - l_i + 1)$$$ and make the first $$$x$$$ elements in the interval $$$(l_i, r_i) 0$$$'s and the rest $$$1$$$'s.
Let's figure out the cost function $$$f(x)$$$ which represents the number of pairs of indices which cannot be inversions with respect to the specific choice of $$$0$$$ and $$$1$$$.
Let $$$b$$$ denote the number of $$$0$$$ in the subarray $$$[a_1, a_{l_i - 1}]$$$ and $$$c$$$ denote the number of $$$1$$$ in the subarray $$$[a_{r_i + 1}, a_n]$$$ (Reminder that we have fixed the entire sequence except for the interval $$$(l_i, r_i)$$$). Let $$$d = r_i - l_i + 1$$$.
Then, the cost function
$$$f(x) = b \cdot (d - x) + c \cdot x + x \cdot (d - x)$$$and we need to find minimum value of the function varying $$$x$$$ from $$$1$$$ to $$$d - 1$$$.
This is a quadratic function with negative coefficient of $$$x^2$$$. This means that it has a central maxima, and then decreasing on both sides, depending on the distance from the central maxima. With some casework on the position of the central maxima, we can conclude that in the allowed interval $$$1 \le x \le (d - 1)$$$, this function is minimized at one of the $$$2$$$ endpoints, i.e. either $$$1$$$ or $$$(d - 1)$$$.
This means that it is optimal to have either $$$a_{l_i} = 0$$$, and $$$a_j = 1$$$ for all $$$(l_i \lt j \le r_i)$$$, OR $$$a_{r_i} = 1$$$ and $$$a_j = 0$$$ for all $$$(l_i \le j \lt r_i)$$$.
Applying this for each interval iteratively, we know that an Optimal Solution satisfies this criteria for all it's intervals
Subproblem Step 3Let's call an interval $$$P$$$-type if it has exactly one $$$1$$$, otherwise $$$S$$$-type.
Claim 2 : There exists a prefix of intervals which will be $$$P$$$-type and then the suffix of intervals will be $$$S$$$-type
Proof 2Remember our cost function
$$$f(x) = b \cdot (d - x) + c \cdot x + x \cdot (d - x)$$$Note, that $$$x \cdot (d - x)$$$ is a fixed value as it is gives the same result for both $$$x = 1$$$ and $$$x = (d - 1)$$$. Let's remove this from the cost function since it is a constant, and also subtract $$$b \cdot d$$$.
Thus, we can modify the cost function
$$$f(x) = b \cdot (d - x) + c \cdot x - b \cdot d = (c - b) \cdot x$$$Now, it is obvious that we should select an interval to be $$$P$$$-type only if $$$(c - b) \ge 0$$$, otherwise $$$S$$$ type.
Further, we can easily note that for intervals $$$I_j$$$ and $$$I_k$$$ where $$$k \gt j$$$, the coefficient $$$c$$$ is smaller for $$$I_k$$$ than it is for $$$I_j$$$, and the coefficient $$$b$$$ is larger for $$$I_k$$$ than it is for $$$I_j$$$. This implies that $$$(c - b)$$$ decreases for $$$I_k$$$ as compared to $$$I_j$$$ since $$$k \gt j$$$.
Thus, we can easily see that a prefix of intervals will have $$$(c - b) \ge 0$$$ and will be $$$P$$$-type. The rest will be $$$S$$$-type.
With this in hand, we can finally solve the problem. Since $$$O(m^2)$$$ works, you can directly bruteforce the prefix of $$$P$$$ type intervals and calculate value according to that.
Step 4Returning to the actual problem, there may exist some elements which belong to no intervals. However, they are easy to handle : assign them values maximising the number of inversions. More precisely, check whether assigning them $$$0$$$ or $$$1$$$ gets you more inversions.
Let $$$S$$$ be the set of elements which belong to no intervals. The above argument works only because in the Optimal Solution for all $$$x \lt y$$$ where $$$x, y$$$ in $$$S$$$, $$$a_x \ge a_y$$$ will hold and they will be an inversion. Otherwise, this argument might not have been correct as the optimum choices might depend on the choices of the other elements in $$$S$$$ themselves.
The above part felt a bit confusing, so I tried to rewrite what It says exactly :
Attempt at Clarity of SolutionWe first fix the $$$a_i$$$ values of those indices $$$i$$$ which belong to an interval, call this set $$$T$$$.
Then, we traverse over the elements of $$$S$$$ (which belong to no interval), and we use the greedy idea of trying both $$$0$$$ or $$$1$$$ and maximising inversions of these elements in $$$S$$$ with respect to $$$T$$$.
It is easy to see that the elements in $$$S$$$ will be assigned $$$1, 1, .... 1, 0, 0, .... 0$$$ and all pairs within $$$S$$$ will be inversions. Thus, this approach works.
Thus, we can still solve this version in $$$O(m^2)$$$.
Additional NoteThis can be extended to the hard version of the problem. Further, this can probably be optimized to $$$O(m)$$$ by quickly calculating the change in answers from prefix of $$$i$$$ intervals being $$$P$$$-type to $$$(i + 1)$$$ intervals being $$$P$$$-type.
The Quadratic function being minimized at the boundaries is a more general observation and can be applied to E2 except now, the boundaries are no longer $$$1$$$ and $$$d - 1$$$. I will leave you to figure out how to extend the solution. There are $$$2$$$ hints below.
Hint 1 of Hard VersionConsider the graph where there exists edge $$$j - k$$$ if intervals $$$I_j$$$ and $$$I_k$$$ have non-empty intersection. Work with the connnected components of this graph.
Hint 2 of Hard VersionReduce the problem to this problem :
- The intervals are disjoint (use the previous hint for this)
- Each interval has $$$2$$$ values associated with it, $$$lb_i$$$ and $$$ub_i$$$ such that the valid range of $$$x_i$$$ (the partition point of $$$(l_i, r_i)$$$) must lie in $$$[lb_i, ub_i]$$$.
What can we learn? - Solving subproblems is very useful. Even this entire problem is essentially a subproblem of the main problem. We can find a solution to this and then generalize/extend to there.
- Reducing the problem into something simpler, for example we went from a permutation to a binary array.
- Finding upper/lower bounds on the answer, and then constructing the bounds, proving they are the answer.
- Analyzing the nature of the Optimal Solution. We did it at least twice here
- Algebra, observing stuff about cost functions.
4. About Stress Testing
Something which a lot of low rated participants do wrong : They don't stress test.
There have even been contests where I stress tested $$$3$$$ problems. I tend to make a lot of errors while writing code and it is not easy to always catch those errors with manually reading or debugging. Learning the skill of stress testing is insanely useful (and you don't need any tools for it, I do it all in the same piece of code without taking much time).
This was the first problem (I think) that i had stress tested during contest : 1854A1 - Dual (Easy Version) I was still done within $$$12$$$ minutes and it took me less than $$$5$$$ mins to stress test, and fix my bug (this is an easy example though as I didn't have to write a brute). My bug only occurred when all elements of the array were equal and negative, I don't think I would manually be able to catch that.
1987F1 - Interesting Problem (Easy Version) In the initial submission to this problem, I missed so many if-cases. I was very careless. But i managed to stress test and avoid more WAs. I think it took me 3 — 4 iterations of finding bug with stress test, fixing them and then again running stress test before my code was actually correct.
I highly recommend stress testing for you. It is simpler than you think : all you need is a brute function, and a generator. I usually replace my input function with a generator, and use a lambda function for the brute. It takes barely $$$5$$$ minutes to setup for most problems.