Блог пользователя bicsi

Автор bicsi, история, 7 лет назад, По-английски

I wanted to showcase a simpler algorithm for the closest pair of points in 2D problem, and maybe to discuss its performance / countercases.

The problem is:

Given N distinct points in Euclidean 2D space, compute the minimum (squared) distance between any two distinct points.

The usual approach here is a divide-and-conquer algorithm, which can be found virtually anywhere, including on Wikipedia. The complexity of this algorithm is O(nlog(n)), but it is rather tricky to achieve this complexity.

The alternative approach (based on the same algorithm), is to do sweep-line. We sort the points based on the x-coordinate and we keep a set of the points in the region x - d, x, sorted by y coordinate. Here d is the smallest distance so far (we do that with the two-pointers technique). Now, for each new point x, y, we query the whole range y - d, y + d in this set and possibly update our answer.

Due to the proof of the D&C algorithm, at each time the quieried range should be of size O(1) on average, so total complexity would be O(nlog(n)). Code is below:

long long ClosestPair(vector<pair<int, int>> pts) {
    int n = pts.size();
    sort(pts.begin(), pts.end());
    set<pair<int, int>> s;

    long long best_dist = 1e18;
    int j = 0;
    for (int i = 0; i < n; ++i) {
        int d = ceil(sqrt(best_dist));
        while (pts[i].first - pts[j].first >= d) {
            s.erase({pts[j].second, pts[j].first});
            j += 1;
        }

        auto it1 = s.lower_bound({pts[i].second - d, pts[i].first});
        auto it2 = s.upper_bound({pts[i].second + d, pts[i].first});
        
        for (auto it = it1; it != it2; ++it) {
            int dx = pts[i].first - it->second;
            int dy = pts[i].second - it->first;
            best_dist = min(best_dist, 1LL * dx * dx + 1LL * dy * dy);      
        } 
        s.insert({pts[i].second, pts[i].first}); 
    }
    return best_dist;
}

What do you think? Are there any special cases (e.g. many points at the same x coordinate) that would break the solution?

  • Проголосовать: нравится
  • +43
  • Проголосовать: не нравится

»
7 лет назад, # |
  Проголосовать: нравится +13 Проголосовать: не нравится

same idea here

»
7 лет назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

There is an explanation of the same method :) https://olympiad.cs.uct.ac.za/presentations/camp1_2009/linesweep.pdf

»
6 лет назад, # |
Rev. 3   Проголосовать: нравится -17 Проголосовать: не нравится

i think the worst case of this implementation is O(n2) for example the points : first point ==> (0, pow(2,n) — pow(2,1)) second point ==> (0, pow(2,n) — pow(2,2)) . . i-th point ==> (0, pow(2,n) — pow(2,i) ) . .

Am i right ?

  • »
    »
    6 лет назад, # ^ |
      Проголосовать: нравится +2 Проголосовать: не нравится

    No. It should only test adjacent pairs in this case. Note that when he inserts the points in the set he swaps x/y.

»
5 лет назад, # |
Rev. 4   Проголосовать: нравится 0 Проголосовать: не нравится

Hey what modifications would you make if you want floating point answer? I tried to modify it, but it didn't work and I am getting wrong answer.

Its unrelated BTW but I liked your streams, when are you going to do them again?

  • »
    »
    5 лет назад, # ^ |
      Проголосовать: нравится +1 Проголосовать: не нравится

    Could you give me a link to the problem? Maybe the best initialization is too low?

    Related to streams/videos, I will probably restart doing them after I finish with university. This summer I don’t have a lot in mind so I will probably take on competitive programming more.

    • »
      »
      »
      5 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      The problem is from my assignment, we just have to find the minimum distance in O(nlogn) or less.

      Task: Given N points on a plane, find the smallest distance between a pair of two (different) points.

      Input Format: The first line contains the number N of points. Each of the following N lines defines a point (xi, yi).

      Constraints: 2 <= N <= 10e5; −10e9 <= xi, yi <= 10e9 are integers.

      Output Format: Output the minimum distance. The absolute value of the difference between the answer of your program and the optimal value should be at most 10−3. To ensure this, output your answer with at least four digits after the decimal point (otherwise your answer, while being computed correctly, can turn out to be wrong because of rounding issues).

      Sample input:

      11
      4 4
      -2 -2
      -3 -4
      -1 3
      2 3
      -4 0
      1 1
      -1 -1
      3 -1
      -4 2
      -2 4
      

      Output:

      1.414213
      
      • »
        »
        »
        »
        5 лет назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        The maximum distance can be up to 4e18, whereas in my code infinity is set to 1e18.

    • »
      »
      »
      5 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      there is a problem like that here , it also needs a floating point answer

»
3 года назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

I have a question about the code on 10th line.

while (pts[i].first - pts[j].first >= best_dist) 
// Should it be >= d?
»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится
int d = ceil(sqrt(best_dist));
while (pts[i].first - pts[j].first >= d) {
    s.erase({pts[j].second, pts[j].first});
    j += 1;
}

Due to precision error, sometimes the j goes out of bounds. (For eg. this problem, out of bounds submission)

Adding an extra clause j < n in the while loop prevents that. (ac submission)

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I believe this problem can also be elegantly solved using 2D trees

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +4 Проголосовать: не нравится

    But 2D trees solves it in $$$O(nlog^2n)$$$, and this solution works in $$$O(nlogn)$$$

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      cant we just run nearestNeighbour(Point p) for every p. and of course ignoring p itself if it has only occurred once in the given set of points.