Problem:
Given a $$$2*N$$$ matrix contains of positive integers with value not greater than $$$10^6$$$. Given an integer $$$K$$$.
Chooses $$$K$$$ column(s) from the matrix, call $$$P$$$ is the sum of all of the top integers from chosen columns, call $$$Q$$$ is the sum of all of the bottom integers from the chosen columns.
Your task is to maximizes $$$P/Q$$$ and print $$$P$$$ and $$$Q$$$ out in its reduced fraction form.
Constraint: $$$1 \leq K \leq N \leq 5*10^4$$$
Time limit: 1s
Input:
First line contains: $$$N$$$, $$$K$$$
Next $$$N$$$ lines contain two integers, first integer belongs in the top row of the matrix, second integer belongs in the bottom row. $$$i$$$-th line represents $$$i$$$-th column of the matrix.
Output:
- Two space-separated integers $$$P$$$ and $$$Q$$$.
Example
Attempts:
It was from our school's training contest that I first saw it. I blindly submitted a greedy sort sol (ratio of $$$top/bot$$$) but got WA. Next I tried to make it TLE, $$$K$$$ iterations with everytime choose a col. that maximizes $$$(currentP + top)/(currentQ + bot)$$$ and it did get a TLE.
But after that I was stumped.
I aksed my coach for the solution code and let the explaination be an excercise but I couldn't figure it out.
Please help me understand the math/intuition behind the idea of the solution.
I apologize that I can't provide submit link.
I don't know the answer, but the fact that the greedy solution doesn't work seems like a manifestation of Simpson's Paradox, which basically says best batting averages in individual years might not yield the best batting average over a period of several years. One can use the wiki's numbers to draw up a counterexample with $$$K=2$$$ and the matrix (I'll write as fractions) $$$[\frac{12}{48},\frac{104}{411},\frac{45}{140}]$$$ which is sorted by ratio, but the selection of 1st and 3rd yields $$$\frac{57}{188}$$$ which is better than $$$\frac{149}{551}$$$
I see. I did have trouble coming up with a counter-example for the greedy sort solution, that was why i crossed my fingers and sent it in.
Your input was helpful. Thanks!
Essentially, some columns have more effect on the the sums $$$P$$$ and $$$Q$$$ than others, meaning that sometimes it is better to choose a worse, "lighter" column than a slightly better, "heavier" one.
For example, the greedy algorithm fails on the following case:
Here, the optimal solution chooses the 1st and 3rd columns, since the 3rd column has little effect on the ratio ($$$501/103 = 4.864... \approx 5$$$) while the 2nd column dominates the ratio ($$$1000500/1000100 = 1.000399... \approx 1$$$)
We can apply binary search by considering a related problem:
Given a $$$2 \times N$$$ matrix and a real number $$$r$$$, select $$$K$$$ columns such that the ratio $$$P/Q$$$ is at least $$$r$$$.
This problem is not that bad: we can calculate for each column how much it would exceed that ratio by thinking of each column $$$(p,q)$$$ instead as $$$(q*r + excess,q)$$$, where $$$excess = p-q*r$$$.
Then, when we sum $$$K$$$ columns in this way, the sum will be $$$(Q*r + \sum excess) / Q$$$, which makes it easy to test if any selection of $$$K$$$ columns works.
If the sum of the excess values is at least $$$0$$$, it works, and if it is less than $$$0$$$, it doesn't.
This also gives us an easy solution to the related problem: sort all the columns by excess, then choose the $$$K$$$ columns with the greatest excess (and if the sum for those $$$K$$$ columns is less than $$$0$$$, there is no solution).
Returning to our original problem, can simply binary search on $$$r$$$ to find the greatest ratio $$$P/Q$$$, and furthermore we can just use our selection of $$$K$$$ columns from before to directly calculate $$$P$$$ and $$$Q$$$.
The solution code is basically this solution and does $$$100$$$ iterations of binary search where it stores the excess in
val[i]
and is careful with floating point issues.First of all, thanks for replying wholeheartedly!
Wow, the solution is so out-of-nowhere for me I am just at lost for words. There isn't a clue or hint to enable me to think in such "deliberated" way. I even needed ~1 hours to process your explaination. Should have done more math when I was younger...
Anyway, since I'm here I would like to add a few things on why the initial Greedy solution failed. Frame the problem like this:
Imagine each column of the matrix is a 2D vector on a plane, top for $$$x-coordinate$$$ bottom for $$$y-coordinate$$$. Angle of a vector (or the line that contains the vector) is related to its slope, which is $$$y/x$$$ (that's inverse the ratio my greedy sort used). By that, our task is now to choose $$$K$$$ vectors so that the vector sum has the greatest angle. And this convinced me why "Sorted by ratio" didn't work by this example: