Proofy's blog

By Proofy, 9 months ago, In English

Introduction

This blog contains well-known facts and techniques about Harmonic Numbers that are scattered all over editorials of some problems, comments, and some parts of blog posts. The place where I found most of the facts explained here is the brilliant blog by Nisiyama_Suzune about Möbius inversion. However, for the untrained eye, this blog may seem as if it's investigating something advanced and the facts explained inside will be missed. Moreover, there were some omitted proofs, so I thought it would be nice to collect all these facts/techniques in one place and add their proofs in (hopefully) an understandable but precise style, which would be useful for many.


For starters, the $$$n$$$th harmonic number, $$$H_n$$$, is the sequence $$$\displaystyle H_n = \sum_{i = 1}^{n}\frac{1}{i}$$$. We will investigate two techniques related to this sequence.


1. Upper bound for Harmonic Numbers

We begin by discussing the famous upper bound for Harmonic Numbers:

Fact 1

This can be noted by observing the graph of $$$f(x) = \frac{1}{x}$$$ and drawing rectangles of base length 1 and height $$$\frac{1}{i}$$$ to get areas that sum up to $$$H_n$$$, similar to this (the image is taken from Stewart Calculus):

and so we have

$$$\displaystyle H_n = \sum_{i = 1}^{n} \frac{1}{i} \le 1 + \int_{1}^{n} \frac{1}{x} \, dx = 1 + \ln x \Biggr|^{n}_{1} = 1 + \ln n \in O (\log n)$$$

(there is a known approximation using Euler-Mascheroni constant but I like this more simple proof).

This is why a loop like this:

for (int i = 1; i <= n; i++) 
    for (int j = i; j <= n; j += i) {
        // some constant code
    }

works in $$$O(n \log n)$$$, because $$$j$$$ iterates over all multiples of $$$i$$$ that are at most $$$n$$$, and there are $$$\displaystyle\left\lfloor \frac{n}{i} \right\rfloor$$$ such multiples, and so the complexity is just $$$\displaystyle\sum_{i = 1}^{n} \left\lfloor \frac{n}{i} \right\rfloor \le \sum_{i = 1}^{n} \frac{n}{i} = n H_n \in O(n \log n)$$$.

This is very useful whenever you want to brute force all divisors of numbers in a range $$$[1, N]$$$. This can be used to precompute the number, sum, product of divisors of all numbers in $$$[1, 10^6]$$$. You can even precompute the divisors themselves:

for (int i = 1; i <= n; i++) 
    for (int j = i; j <= n; j += i) divs[j].push_back(i);

This can also be used to calculate the Euler-Totient function $$$\varphi(n)$$$ and can be used in some inclusion-exclusion DP to precalculate the array $$$\text{dp}[x] = \sum_{i < j} [\gcd(a_i, a_j) = x]$$$ where $$$a_1, a_2, \dots, a_n$$$ is an array with $$$1 \le a_i \le 10^6$$$.

Code for precomputing dp[x]

Example problems: (feel free to add more in the comments)


2. Distinct values of $$$\left\lfloor\frac{n}{i}\right\rfloor$$$

Now, let's get to even more interesting facts:

Fact 2

We will discuss why this is true and how to retrieve the exact values over the exact ranges in $$$O(\sqrt{n})$$$.

Proof of Fact 2

Now, note that these values are non-increasing as $$$i$$$ increases. For example, for $$$n = 10$$$, we get the values (starting from $$$i = 1$$$) $$$10, 5, 3, 2, 2, 1, 1, 1, 1, 1, 0, 0, \dots$$$. So, to find these values, we will prove the following fact:

Fact 3

(note that I will write a mathematical proof and it may not seem intuitive to see, and I will explain its intricacies afterwards, so don't get shocked :D)

Proof of Fact 3

Now, what this proof is trying to say (skip these two paragraphs if you understand and can imagine the proof) is that if you get that $$$\displaystyle \left\lfloor \frac{n}{i} \right\rfloor = q$$$ for some $$$i$$$ and $$$q$$$, and again $$$\displaystyle \left\lfloor \frac{n}{i + 1} \right\rfloor = q$$$ as well, then we know for sure that $$$n \text{ mod } (i + 1) = (n \text{ mod } i) - q$$$ by the division algorithm. (If $$$n = qi + r = q(i + 1) + r'$$$ then $$$r' = r - q$$$).

This means that with every step forward, the remainder decreases by exactly the value of the floor division. So, the remainder is maximal at the first time we reach some value $$$q$$$ (at $$$l$$$), and the remainder keeps decreasing by $$$q$$$ until we can't decrease anymore (decreasing will makes the remainder negative), and so the floor division value becomes $$$q - 1$$$ instead. So, to find the maximum point at which the floor value stays the same $$$q$$$, we find the maximum $$$k$$$ such that $$$qk \le n \text{ mod } l$$$, which is $$$\left\lfloor \frac{n \text{ mod } l}{q} \right\rfloor$$$, and we add that value to $$$l$$$ to get the same conclusion.

Omitted Algebra

This fact is very useful in optimizing solutions to many problems, and I will demonstrate this by solving one example: CSES — Sum of Divisors. (The reader is encouraged to solve this on his/her own before reading on).

The problem wants to find the summation of divisors of all integers up to $$$n$$$, but with $$$n \le 10^{12}$$$. This means that at most $$$O(\sqrt{n})$$$ complexity is required. Now, the idea here is to fix a divisor and count how many times it will be a divisor. Well, for a divisor $$$d$$$, the number of multiples of $$$d$$$ up to $$$n$$$ is $$$\displaystyle \left\lfloor \frac{n}{d} \right\rfloor$$$, so the answer is just

$$$\displaystyle \sum_{d = 1}^{n} d \left\lfloor \frac{n}{d} \right\rfloor$$$

Here, to optimize, we can fix the value of $$$\displaystyle \left\lfloor \frac{n}{d} \right\rfloor$$$ and sum all $$$d$$$ that has this value. But from what we've arrived at from the above claim, we can easily do this:

Code 1.1

The code iterates over ranges that have the same value of floor division when $$$n$$$ is divided by them and multiplies the floor division value by the sum of elements in the range, modulo $$$10^9 + 7$$$. Each time we calculate $$$r$$$ (as defined in Fact 3) and at the end of the iteration, we make the next $$$l$$$ as $$$r + 1$$$, since $$$r$$$ is the maximum then $$$r + 1$$$ will get a new floor division value. So, the idea is always to get the ranges of the same floor value and process them however you like

for (ll l = 1, r = 1; (n/l); l = r + 1) {
    r = (n/(n/l));
    // q = (n/l), process the range [l, r]
}

This fact also turns out to be useful in optimizing some Möbius inversion formulas.

Example problems: (feel free to add more in the comments)

Full text and comments »

  • Vote: I like it
  • +140
  • Vote: I do not like it

By Proofy, history, 10 months ago, In English

Hello all!

The Story

Recently, I was competing in CodeTON Round 5 (Div. 1 + Div. 2, Rated, Prizes!) and enjoying its problems. I solved A, B, and C, and took quite a bit of time with D and then when I finally finished the code for E (which I thought was buggy), there were less than 5 minutes for the contest time, so I submitted really quickly to see if my code passed.

Apparently, I clumsily did not read codeforces rules carefully, but I knew later that having multiple accounts was prohibited in codeforces. I was participating in this contest from another account (EvilProofy) and was planning to compete for the rest of the contest from this account.

When I submitted the code in the last 4-5 minutes in the contest hastily, I realized that I submitted from my main account (this one) by mistake. I clumsily submitted the same codes for the rest of the problems from both accounts; I was in a rush because there was not much time left in the contest.

I felt a bit of sorrow right after the contest because of the clumsy mistake I made, so I released a blog to explain my situation to see if I can undo what has just been done, and it was only then (thanks — so much — to colposh for telling me) that I learned it was mentioned in https://mirror.codeforces.com/help#q6: "Don't create more than one account, if you have forgotten the password, use the password reminding system" and then this blog got a flood of downvotes that it even became inaccessible to other codeforces users.


My Reflections

I love codeforces and respect its rules, and I love participating in codeforces contests (Actually, codeforces isn't even related to my career and I just participate for fun). I enjoy the problems, and I don't care that much about rating. I sincerely apologize for not reading the rules carefully; I understand that this deserved the downvotes.

I felt like this contest would appear on my chart as a major drop and scar my account forever, and if someone checked it out, it would appear that I cheated. I wanted to apologize again for not reading carefully and missing this rule. I can even have my other account EvilProofy banned and closed, and I would be OK. I would, of course, love it if the contest was unrated for me and the other account becomes simply closed/banned. However, I really don't care about the rating drop.

I am writing this blog just to make it clear that I did not cheat (I hate cheating). If I had known that creating multiple accounts is prohibited, I would not have created the other account.

It looks to me like I have lost being a respected codeforces user, which troubled me and made me sad for a bit. I am sorry if that blog wasted your time, respected reader, but it was something I wanted to document. Thank you so much for making it here, you're a nice person :)

Full text and comments »

  • Vote: I like it
  • +39
  • Vote: I do not like it

By Proofy, history, 2 years ago, In English

Introduction

Some of you smart people out there may find the contents of this blog so obvious, that it does not deserve to be called a "technique." It is just the totally normal thought process that comes across our minds when we try to solve a problem!

However, I often find it useful (and others may relate) to state my thoughts explicitly when trying to conquer a problem, whether I write it down on a sheet of paper, comment it in my code, or just talk out loud like a crazy guy :D. I observe that one of the things that makes a problem-solver better than another (other than practice and knowledge about certain topics/algorithms, of course) is the way of thinking and approaching a problem. So, to make myself a better problem-solver, I sometimes go about thinking about how I think and try to improve this way of thinking generically and generally about any problem. It is, to many great problem-solvers, one of the byproducts of practicing problems a lot that develops implicitly, and stating such byproducts to myself explicitly is very helpful to me in order to speed up my learning process.

In this blog, I will try to explain a technique of thinking that I found recurring and often helpful in facing many problems that may seem daunting to many people at first sight, with an unorganized train of thought. Afterward, I will try to apply this technique to some Codeforces problems I solved that I found considerably not easy to go about when I haven't tried this technique explicitly. It is highly encouraged to try these problems out yourself before reading the solution (I know there are already posted great editorials for the problems, but I wrote the solution in my style in order to cope with the theme of this blog).

Disclaimer: I don't know if someone else posted something similar to this technique on Codeforces or outside Codeforces, and I tried to search but couldn't; that is why I thought of posting it myself.

The technique (Characteristics of the optimal solution)

A lot of the problems we face involve finding an optimal solution of some kind, e.g. find a subsequence that has minimum * something * or find a graph of an array that satisfies some requirements. The main technique is to think as follows:

Suppose I did find such a solution, what would it look like? what characteristics it would have? Can we toy around with such a solution so that it remains optimal?

From asking ourselves this question and trying to answer it, we are able to come up with very useful observations that help us in finding the solution. Moreover, the important thing is to have the courage to toy around with the solution and often you would try to reduce it while still satisfying the requirements (e.g. if a subsequence has a minimum * something *, can I reduce the number of elements in it so that it still has the minimum * something *?) If you still don't fully comprehend how useful this may be, don't worry; it will be more clear with the problems.

Problem 1. 1592C - Bakry and Partitioning

(Again, it is highly encouraged to try the problem out yourself if you haven't — before proceeding in this blog.)

So, the problem asks us to find a way to partition our tree into a forest, where each tree has the same bitwise XOR value as all the other trees. Let's try to apply our technique here:

Suppose I did find a way to partition my trees into a forest, and now I have a forest containing $$$q$$$ trees with the same bitwise XOR value $$$x$$$, are there any characteristics of these $$$q$$$ trees? Can I toy around with them a bit and reduce the number of trees?

In this problem, it would be useful to toy around with them.

We note that if we have a tree that we partitioned into 2 trees with XOR values $$$a$$$ and $$$b$$$, then it is clear that before partitioning, the whole tree had an XOR value of $$$a \oplus b$$$. That means that the kind of "toying around" we can do here is to merge two trees into one with and XOR their values. Now, let's get back to our optimal solution. If we try to merge two trees that both have an XOR value of $$$x$$$, then the resultant tree has an XOR value $$$x \oplus x = 0$$$ (don't worry, we didn't ruin the optimality of the solution). If we merge one more tree to the resultant tree, the final tree would have an XOR value of $$$x \oplus 0 = x$$$. So, if we have $$$q$$$ trees in our optimal solution, we can reduce them to $$$q - 2$$$ trees, and the solution would still remain optimal. So, if $$$q$$$ is even, we can reduce it to $$$2$$$ and if $$$q$$$ is odd we can reduce it to $$$3$$$. This means that if a solution exists with $$$q$$$ trees, then so does a solution with $$$q \mod 2 + 2$$$ trees, and we only need to check if we can cut one edge or two edges. The remaining part of the problem is some proper handling of the two cases which is irrelevant to this blog (you can check it out in the editorial if it is still a little difficult for you).

Problem 2. 1629D - Peculiar Movie Preferences

We note that if one string is a palindrome, then we are done (if there is a string of length 1, it would automatically be a palindrome, so we would assume that no strings are of length 1). Otherwise, let's apply our technique:

If a palindrome consists of multiple strings, what would they look like? Can I toy around with them a bit?

Now, it is important to note that if a string is a palindrome, then every prefix is also a suffix of the string. So, if we have a palindromic string consisting of strings of lengths 2 and 3, we can toy around with it by fixing the first string and the last string and drop those in between, and the string would still remain palindromic. This reduces the problem to finding two strings that can be concatenated to form a palindrome.

Problem 3. 1366D - Two Divisors

At first sight, the problem would make me scratch my head a while, asking recurrently "how do I find such divisors?". However, I try to apply this technique and ask:

Let's suppose I did find two divisors $$$d_1$$$ and $$$d_2$$$ where $$$\text{gcd}(d_1 + d_2, a_i) = 1$$$, what characteristics can those two divisors have?

Well it's important to note that $$$d_1$$$ and $$$d_2$$$ are divisors of $$$a_i$$$, but if $$$\text{gcd}(d_1 + d_2, a_i) = 1$$$, then for every prime $$$p|a_i$$$, $$$p \not | (d_1 + d_2) \implies p \not | d_1$$$ or $$$p \not | d_2$$$. This means that if there is a prime $$$p$$$ that divides $$$d_1$$$, it can not divide $$$d_2$$$, otherwise $$$\text{gcd}(d_1 + d_2, a_i) \ge p$$$, so we immediately conclude $$$\text{gcd}(d_1, d_2) = 1$$$, which can't happen if $$$a_i$$$ has one prime divisor, so we assume it would have more than one prime divisor.

If we look at such divisors, we note that if a prime $$$p$$$ does not divide both divisors, then it is possible that it may divide their sum (e.g. $$$5 | (2 + 3)$$$). However, if for every prime divisor of $$$a_i$$$, $$$p$$$ divides one of the two divisors and not the other, then we are certain that $$$p$$$ doesn't divide their sum (because if we assume WLOG $$$p | d_1$$$, then $$$d_1 + d_2 \equiv d_2 \pmod{p}$$$). This means we can partition the primes of $$$a_i$$$ into $$$d_1$$$ and $$$d_2$$$, and so each prime would divide one of the divisors and not the other. So, we can solve the problem by taking one prime divisor of $$$a_i$$$, p, that divides $$$a_i$$$ and divide $$$a_i$$$ by it until it's no longer divisible, and check if $$$a_i$$$ still remains more than 1 and if so, we would have our solution. That one prime divisor of $$$a_i$$$ can be found using normal sieve of eratosthenes.

Problem 4. 1343E - Weights Distributing

It can be apparent that we should distribute the weights on our path greedily, with the minimum having the highest priority. The number of edges we need to distribute the weights on has to be minimal, otherwise, we would need to use a new price from the given array. That way, our prices has to distribute on the edges that are on the shortest paths between $$$a$$$ and $$$b$$$ and $$$b$$$ and $$$c$$$, but an important thing to note is that in a graph, there can be multiple shortest paths.

Let's now ask ourselves: what would the shortest paths look like? What characteristics of the shortest path do we need in order to have them include the minimum price possible?

Ok, so the shortest paths can be one straight line from $$$a$$$ to $$$b$$$ and $$$b$$$ to $$$c$$$, that is the two paths $$$a \to b$$$ and $$$b \to c$$$ are not intersecting with only $$$b$$$ as a common node, otherwise, they would intersect in a considerable number of nodes, so our "optimal solution" would include at least one intersection point. We can fix a common node $$$x$$$, and our path would look like $$$a \to x$$$, $$$x \to b$$$, $$$b \to x$$$, then $$$x \to c$$$, with the common edges on the path $$$x \to b$$$ only. So, we would have to distribute the minimal prices on the edges that are on the path from $$$x \to b$$$ and then $$$a \to x$$$ and then $$$x \to c$$$ because our path would have $$$\text{dist}(a,x) + 2\text{dist}(b,x) + \text{dist}(c,x)$$$. So, we would store the shortest paths from $$$a, b,$$$ and $$$c$$$ using some shortest path algorithm like Dijkstra in 3 arrays, and iterate over $$$x$$$ and minimize $$$\text{pref}[\text{dist}(b,x)] + \text{pref}[\text{dist}(a,x) + \text{dist}(b,x) + \text{dist}(c,x)]$$$, where $$$\text{pref}$$$ is a prefix sum array, on the sorted array of prices.

Conclusion

I hope you found these ideas helpful and not a waste of time.

From my naïve perception, the whole of Competitive Programming can be partitioned into pure problem-solving and thinking skills, and techniques/topics/algorithms that one may learn to help him tackle some problems like graphs, Number Theory, DP, ... . The former part, I see, is implicit to most problem solvers and it is just part of their unorganized train of thought that becomes more and more organized with practice. But, I think it can also be structured, and taught. You can consider this blog's content as a technique to have some kind of organization of the train of thought; it's a help in the "pure problem-solving and thinking skills" part, not the topics/algorithms part.

(More practice problems will be added once I observe them. If you have some recommendations of good problems with this technique involved, please post them in the comments and I would add them to the blog).

Here are some practice problems:

Full text and comments »

  • Vote: I like it
  • +87
  • Vote: I do not like it