N̶e̶e̶d̶ ̶h̶e̶l̶p̶: Selects K different columns from a 2xN matrix, maximizes the ratio of (sum selected top) to (sum selected bottom)

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	156
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Problem:

Given a $$$2*N$$$ matrix contains of positive integers with value not greater than $$$10^6$$$. Given an integer $$$K$$$.

Chooses $$$K$$$ column(s) from the matrix, call $$$P$$$ is the sum of all of the top integers from chosen columns, call $$$Q$$$ is the sum of all of the bottom integers from the chosen columns.

Your task is to maximizes $$$P/Q$$$ and print $$$P$$$ and $$$Q$$$ out in its reduced fraction form.

Constraint: $$$1 \leq K \leq N \leq 5*10^4$$$

Time limit: 1s

Input:

First line contains: $$$N$$$, $$$K$$$
Next $$$N$$$ lines contain two integers, first integer belongs in the top row of the matrix, second integer belongs in the bottom row. $$$i$$$-th line represents $$$i$$$-th column of the matrix.

Output:

Two space-separated integers $$$P$$$ and $$$Q$$$.

Example

Input

Output

2 1

Attempts:

It was from our school's training contest that I first saw it. I blindly submitted a greedy sort sol (ratio of $$$top/bot$$$) but got WA. Next I tried to make it TLE, $$$K$$$ iterations with everytime choose a col. that maximizes $$$(currentP + top)/(currentQ + bot)$$$ and it did get a TLE.

But after that I was stumped.

I aksed my coach for the solution code and let the explaination be an excercise but I couldn't figure it out.

Please help me understand the math/intuition behind the idea of the solution.

I apologize that I can't provide submit link.

Solution:

Spoiler

int main() {
    ios :: sync_with_stdio (0);
    cin.tie (0); cout.tie (0);
    int n, k;
    scanf("%d%d", &n, &k);
    vector a(n), b(n);
    for(int i = 0; i < n; ++i) scanf("%d%d", &a[i], &b[i]);
    long long p = -1, q = 1;
    double lo = 0, hi = 1000000;
    for(int i = 0; i < 100; ++i) {
        double mid = (lo+hi)/2;
        vector > val(n);
        for(int i = 0; i < n; ++i) val[i] = make_pair(a[i]-mid*b[i], i);
        sort(val.begin(), val.end());
        double sum = 0;
        for(int i = n-k; i < n; ++i) sum += val[i].first;
        if (sum > -EPS) {
            p = 0, q = 0;
            for(int i = n-k; i < n; ++i) {
                p += a[val[i].second]; q += b[val[i].second];
            }
            lo = mid;
        } else hi = mid;
    }
    cout << p/__gcd(p,q) << " " << q/__gcd(p,q);
}

Comments (4)

Write comment?

supermatthew

5 years ago, # |

I don't know the answer, but the fact that the greedy solution doesn't work seems like a manifestation of Simpson's Paradox, which basically says best batting averages in individual years might not yield the best batting average over a period of several years. One can use the wiki's numbers to draw up a counterexample with $$$K=2$$$ and the matrix (I'll write as fractions) $$$[\frac{12}{48},\frac{104}{411},\frac{45}{140}]$$$ which is sorted by ratio, but the selection of 1st and 3rd yields $$$\frac{57}{188}$$$ which is better than $$$\frac{149}{551}$$$

→ Reply

3509

5 years ago, # ^ |

I see. I did have trouble coming up with a counter-example for the greedy sort solution, that was why i crossed my fingers and sent it in.

Your input was helpful. Thanks!

wwdd

Why the greedy solution fails

Essentially, some columns have more effect on the the sums $$$P$$$ and $$$Q$$$ than others, meaning that sometimes it is better to choose a worse, "lighter" column than a slightly better, "heavier" one.

For example, the greedy algorithm fails on the following case:

3 2
500 100
1000000 1000000
1 3

Here, the optimal solution chooses the 1st and 3rd columns, since the 3rd column has little effect on the ratio ($$$501/103 = 4.864... \approx 5$$$) while the 2nd column dominates the ratio ($$$1000500/1000100 = 1.000399... \approx 1$$$)

Solution explanation

← Rev. 2 →

First of all, thanks for replying wholeheartedly!

Wow, the solution is so out-of-nowhere for me I am just at lost for words. There isn't a clue or hint to enable me to think in such "deliberated" way. I even needed ~1 hours to process your explaination. Should have done more math when I was younger...

Anyway, since I'm here I would like to add a few things on why the initial Greedy solution failed. Frame the problem like this:

Imagine each column of the matrix is a 2D vector on a plane, top for $$$x-coordinate$$$ bottom for $$$y-coordinate$$$. Angle of a vector (or the line that contains the vector) is related to its slope, which is $$$y/x$$$ (that's inverse the ratio my greedy sort used). By that, our task is now to choose $$$K$$$ vectors so that the vector sum has the greatest angle. And this convinced me why "Sorted by ratio" didn't work by this example:

img

3509's blog

Problem:

Attempts:

Solution: