Yandex.Algorithm Online Round 1 Analysis

In the first online round of Yandex.Algorithm competition 609 participants submitted at least one solution and 396 submitted at least one correct solution. Problems Non-Squares and Kingdom Division were fairly easy, both resulted in approximately 300 correct submissions. Problem Stacks of Coins was moderately easy and was solved by 145 participants. The three other problems proved to be more difficult. Stickers could be a very challenging string problem, but small input size allowed for dynamic programming solution, which 22 contestants got right. Assistants was solved by 13 participants; it required binary search, greedy, and some basic data structures to get it under time limit.During the contest, nobody was able to solve State Roads, which was a neat graph-theoretical problem with a simple to code, but a quite tricky solution.

Congratulations to Kenny_HORROR who was the only one to solve five problems, and to Petr, tourist, and eatmore who solved four problems and with negative penalty times also advanced to the final round.

The problem set of this round was brought to you by the Polish team: Tomasz Idziaszek ( Kingdom Division ), Jakub Łącki ( State Roads ), Marcin Pilipczuk ( Assistants ), and Jakub Radoszewski ( Non-Squares, Stacks of Coins ). Special thanks goes to Professor Costas Iliopoulos for suggesting the idea of the problem Stickers.

Solutions, test data and analysis were prepared by marek.cygan, monsoon, and jakubr.

Problem A (Assistants)

In this problem we need to find the minimum number of days in which n research tasks can be completed. The i-th task can be executed by the professor using p_i consecutive days or can be delegated to an assistant who will execute it using a_i consecutive days. Every engaged assistant can execute only one task, and the professor needs one day to find such an assistant.

First we show how to check whether the project can be completed in k days. Then it is only matter of using binary search to obtain the answer.

It is easy to see that there exists an optimal schedule in which the professor first searches for assistants, and then he executes unassigned tasks. Moreover, it means that we do not have to worry whether the professor executes his tasks during consecutive days. Formally speaking, if there is an optimal schedule in which the professor assigns tasks $\text{[math]}$ to assistants and he executes the remaining tasks (but not necessarily in consecutive days), then there exists an optimal schedule in which the professor delegates the tasks from T during the first #T days, and later he executes the remaining tasks (each task in consecutive days). Therefore, if the professor delegates the tasks from T, then he will finish the remaining tasks before the deadline if and only if $\text{[math]}$ .

Now we show an algorithm which will check whether such T exists. We consider days d = k, k - 1, ..., 1, and during some days we will be delegating tasks to assistants. Assume that we are in day d and we already delegated tasks $\text{[math]}$ during days after d. Let $\text{[math]}$ be the set of tasks which can be assigned in day d such that an assistant will complete them before the deadline. The main observation is as follows: if $\text{[math]}$ is not empty, then we can in day d greedily assign task $\text{[math]}$ which maximizes the value of p_i.

Now we prove it. Suppose that there exists a schedule S, which in the days after d is the same as our plan. If the schedule S assigns task i in day d, then we are done. Otherwise in the schedule S in day d the professor assigns task j ≠ i or executes task j himself (in this case possibly j = i).

In the first case, the professor assigns task j ≠ i. Then $\text{[math]}$ (since assistant will finish before the deadline) and $\text{[math]}$ (we assume that S is the same as our plan in the following days). From our greedy choice we have that p_i ≥ p_j. If in S the professor assigns task i in day d', then d' ≤ d (since $\text{[math]}$ , and S is the same as our plan in the following days) and we can swap in S the days in which we assign tasks i and j. Otherwise, if in S the professor executes task i himself, then we can alter S as follows: we delegate task i in day d, and execute task j during these days in which the professor have executed task i (we can do that since p_i ≥ p_j).

In the second case, the professor executes task j in day d. If j = i, then we can also delegate this task in day d (since $\text{[math]}$ ). Otherwise, if j ≠ i, we can delegate task i in day d, and reclaim the missing day to execute task j in the moment in which the professor was working on task i (delegating or executing it) in the schedule S.

In all the cases we get a correct optimal schedule in which the professor delegates task i in day d, therefore the key observation is proved.

How fast can we implement the above algorithm? Choosing maximum from the set $\text{[math]}$ can be implemented using a binary heap ordered by p_i. When moving from day d + 1 to day d we add the elements from $\text{[math]}$ to the heap (i.e. tasks with a_i = k - d; we need to sort the tasks by a_i to do this), and we remove the maximal element (and we add it to T).

Observe, that we do not have to iterate over all days, but we can start from the day d = min(k, n), since there is no need to assign tasks after the n-th day. Thus the whole solution (with the fixed k) works in time $\text{[math]}$ .

We can improve it by iterating not over the days, but over the tasks sorted non-increasingly by p_i. Every time we try to assign task i in the latest day from {1, 2, ..., d_i} where d_i = min(k - a_i, n) in which no task is yet assigned. It is not hard to see that the set T we get will be the same as in the previous algorithm. This assignment can be performed in almost-linear time, using the structure for disjoint sets. Each set contains the free day d and all the days after it in which some task are assigned. Now assigning the task boils down to finding the set which contains d_i, obtaining the minimal element d' in this set, assigning task i during day d', and joining the set with the set containing element d' - 1.

We can now estimate the final time complexity. Since assigning all the tasks to assistants is a valid solution, thus the answer will always be smaller than n + A, where $\text{[math]}$ . Adding binary search, we get the solution which works in time $\text{[math]}$ .

Problem B (Kingdom Division)

In this problem we are to divide an n × m grid into the maximum number of parts of distinct sizes formed by connected sets of unit squares. We need to present an example of such a division using characters from $\text{[math]}$ to denote the resulting parts. A solution to this problem can be obtained in three natural steps.

In the first step we forget for a moment about the requirement of connectivity and ask what sizes of parts would imply the maximum result. The answer is simple, the sizes should be equal to 1, 2, 3, ... If the last part is smaller than required, we join it with the next-to-last part.

In the second step we note that the division proposed in the previous step is actually possible to obtain. We can cover the whole grid with a snake-like pattern and then cut the snake at appropriate positions to form the respective parts which will certainly be connected sets, see the figure to the left.

The third step is to label parts with the letters from $\text{[math]}$ so that no two adjacent parts receive the same label. A tempting idea would be to label them $\text{[math]}$ in the order they appear in the snake. However, such a labeling does not work in some cases.

One possible correct solution is to label each part with the smallest letter that does not occur among its neighbours (and perform the labeling in the snake-order). Another idea is to label the parts in the first row using the letters A and B, in the second row use the letters C and D, in the third row use the letters E and F, in the fourth row again A and B and so on (see the figure to the right). Starting from the point when each part covers at least one row, we can stick to just two letters, A and B. In the latter labeling we use only 6 distinct letters.

An interesting question is to find what is the minimum number of letters sufficient to solve every test case.

Of course the above algorithm can be easily implemented in O(nm) time.

Problem C (Non-Squares)

In this problem we need to check if a given positive integer n can be represented as a product of k positive integers all of which are not integer squares. Let us call such a representation a non-square representation.

A generic approach could be dynamic programming over the set of divisors of n. All divisors of n can be found in $\text{[math]}$ time. Let D(n) be the number of divisors of n. It turns out that for n ≤ 10⁹ we have D(n) ≤ 1344.

We compute a 2-dimensional Boolean array $\text{[math]}$ , where $\text{[math]}$ is true if and only if the divisor d can be represented as a product of j non-squares. The computation uses the following Boolean formula that works for j ≥ 1 and $\text{[math]}$ :

$\text{[math]}$

The time complexity of this solution is $\text{[math]}$ due to a map-like data structure required to keep track of the divisors of n. The memory complexity is O(D(n)k).

One could also improve the running time of this solution by noting that the parameter k is actually bounded by $\text{[math]}$ , which in our case is 29. Indeed, for $\text{[math]}$ the answer is always “NO” (an argument for this can be inferred from the next solution).

There is also a far simpler solution that examines the decomposition of n into prime factors. Let n = p₁^α₁p₂^α₂... p_m^α_m where p_i are distinct prime numbers. Then the solution is the following:

If α₁ + α₂ + ... + α_m < k then output “NO”.
Otherwise, if m > 1 then output “YES”.
Otherwise we have m = 1. If k - α₁ is even, then output “YES”, otherwise output “NO”.

Let us provide some justification for this solution. Point (1) follows from the fact that any non-square divisor must have a prime factor. For m = 1, we simply need to check if α₁ can be represented as a sum of k odd integers, which is possible if and only if the parity of k and α₁ are the same. This yields point (3) of the algorithm.

Finally, consider point (2). We need to show that if α₁ + α₂ + ... + α_m ≥ k and m > 1 then there exists a non-square representation of n. Let us take as the first k - 1 non-square divisors the prime divisors of n: first α₁ times the divisor p₁, then α₂ times the divisor p₂ and so on. As the last divisor, d, we take the product of all the remaining prime factors. If d is not an integer square, we have the requested representation. Otherwise we change the last divisor to d / p_m and the first divisor to p₁ p_m, thus obtaining the representation.

This solution only requires to find the prime decomposition of n and works in $\text{[math]}$ time.

Problem D (Stacks of Coins)

In this problem m gold and silver coins are arranged into n stacks of different heights. We are to rearrange the coins so that there is a maximum number of unicolor stacks (a unicolor stack has only gold or only silver coins). We are allowed to perform any number of operations of swapping two coins that are located at the same height of different stacks.

Let k denote the result, that is, the maximum number of unicolor stacks that can be formed. Let us do a binary search for k. From now on we are left with a decision problem: given k, check if it is valid. Clearly it is optimal to try to make k smallest of the stacks unicolor.

Since the number of operations is not bounded, our problem can be reformulated as follows. Let h be the height of the k-th smallest stack. For each height level 1, ..., h we know the total number of gold and silver coins at this level. We need to check if these numbers are sufficient to create k unicolor stacks. This task will reduce to computing an intersection of O(h) intervals.

First we consider each height level separately. Assume there are g_i gold and s_i silver coins available at this level and the k smallest stacks contain in total t_i coins at this level. Using these numbers, we can compute an interval [a_i, b_i] such that $\text{[math]}$ if and only if one can take x gold coins and t_i - x silver coins to form the i-th level of the selected stacks. Note that such an interval implies the interval [t_i - b_i, t_i - a_i] for the number of silver coins.

Finally we need to make sure that the intervals for different levels allow to find a valid solution. We iterate from the bottommost to the topmost level. For a given level i, we need to intersect the interval [a_i, b_i] with [a_i - 1, b_i - 1] and similarly the intervals for the silver coins. Thus we obtain a single updated interval [a_i, b_i] which we then use to update the intervals for higher levels.

The whole solution works in $\text{[math]}$ time.

This problem has also a linear-time solution. Let g_i be the total number of gold coins which are located on the i-th level above the ground. Let g'_i be the maximum number of stacks which can be formed such that they have all gold coins on the i-th level and below. It can be easily computed using the following recurrence:

$\text{[math]}$

Similarly, we define s_i and s'_i for silver coins.

Let h_i be the height of the i-th stack, and suppose that stacks are sorted non-increasingly by heights, i.e. h₁ ≥ h₂ ≥ ... ≥ h_n. We iterate over stacks and “color” them (making them unicolor) when possible. Suppose we consider stack i and we already colored stacks numbered 1, 2, ..., i - 1 in such a way that we got G gold stacks and S silver stacks.

The i-th stack can be colored gold if and only if g'_{h_i} > G. In such case we show that there is an optimal solution in which this stack is colored gold and stacks 1, 2, ..., i - 1 are colored the same as in our solution. Consider any optimal solution in which stacks 1, 2, ..., i - 1 are colored the same as in our solution. Let j ≥ i be the highest gold stack in this optimal solution (it must exist, otherwise we can add stack i and get a better solution). Now we can swap all the coins from the stack j to the stack i and, since g'_{h_j + 1} ≥ g'_{h_i} > G, we also have sufficiently many gold coins to swap into the stack i above level h_j.

Similarly, if we cannot color the i-th stack gold, but we can color it silver (i.e. s'_{h_i} > S), there is an optimal solution in which we can do this.

Otherwise we have g'_{h_i} + s'_{h_i} ≤ G + S, which means that we cannot color more than G + S stacks of heights not less that h_i.

Since we can sort the heights of the stacks using counting sort, the above algorithm runs in O(n + m) time.

Problem E (State Roads)

In this problem we are given a graph in a dynamic setting. The set of nodes V is fixed, but the edges of the graph are added and deleted over time. We are to answer queries of the following type: given a set of nodes $\text{[math]}$ and a moment of time, check whether these nodes form one or more whole connected components of the graph in this moment of time.

We present a randomized solution to this task, which is very simple to code, but also quite tricky. For each edge that is added to the graph we pick at random an integer weight. For each node $\text{[math]}$ we store in $\text{[math]}$ the $\text{[math]}$ -product of weights of all edges that are adjacent to this node. Note that such values can be updated in O(1) time upon insertion and deletion of edges.

Now the crucial observation is that {u₁, u₂, ..., u_k} forms a number of whole connected components if and only if each edge in the graph is adjacent to exactly zero or two of the nodes u_i. Hence, to answer a query, we compute:

$\text{[math]}$

where $\text{[math]}$ denotes the $\text{[math]}$ -operation. If the above value is non-zero, we know that the answer is “NO”. Otherwise with high probability the answer is “YES”. The probability of an incorrect positive answer is $\text{[math]}$ where M is the maximum weight of an edge. This probability turns out sufficiently small for M = 2³² or M = 2⁶⁴.

The whole solution works in linear time with respect to the size of the input.

Problem F (Stickers)

In this problem we are given n kinds of stickers s₁, ..., s_n, where each s_i is a word of length d. We are asked what is the minimum number of stickers needed to paste a given word t of length m.

We present a simple dynamic programming solution to this problem. Denote by dp[i] the minimum number of stickers needed to paste exactly the suffix t[i..m] and by $\text{[math]}$ the same number, but when we allow stickers to overflow and cover positions from t[i - d..i - 1] (but not necessarily with correct letters). The answer to the problem is dp[1].

Let us see how to compute dp[i]. The i-th position of t must be at some point covered by a sticker pasted in this position. Since all the stickers have the same length, we can restrict ourselves to pasting this sticker as the last one or as the first one. This gives us two cases:

For some j we have s_j = t[i..i + d - 1]. Thus we can paste the suffix t[i + d..m] possibly covering letters from t[i..i + d - 1] and after that paste s_j at position i. In this case we use $\text{[math]}$ stickers.
We have s_j[1..d'] = t[i..i + d' - 1] for some j and some d' ≤ d. Thus we can paste s_j at position i and then paste t[i + d'..m] with restriction that we cannot modify the letters before position i + d'. In this case we use 1 + dp[i + d'] stickers.

If for every position i we know the maximum l[i] such that t[i..i + l[i] - 1] is a prefix of some sticker, then the case (1) is possible if d = l[i], and the outcome from the case (2) is equal to min_{1 ≤ j ≤ l[i]}(1 + dp[i + j]).

We compute $\text{[math]}$ similarly, but now instead of the set of all stickers we consider the set of all the suffixes of stickers. Thus we compute l[i] as the maximum number such that t[i..i + l[i] - 1] is an infix of some sticker.

The above solution can be implemented in the time complexity of O(mnd²). Surprisingly, this problem can be solved in linear time with respect to the size of the input, even with stickers of different lengths, but such solution is a lot more complicated.