ologn_13's blog

By ologn_13, 12 years ago, In English

Hi all!! The problem is related to searching the k-nearest neighbor of a point in 2-D co-ordinate system. Inputs are given in form of (x,y) coordinates and we have to find out k-nearest neighbor of a point. I search thoroughly on internet, but I got only pure theoretic algorithms in term of searching points in a hypothetical sphere. I also read that it can be solved using k-d trees, which I learn, but I am not getting how to implement the algorithm. Please help, if anyone have implemented it before or had a look at its implementation earlier. Thanks!

  • Vote: I like it
  • +6
  • Vote: I do not like it

| Write comment?
»
12 years ago, # |
  Vote: I like it 0 Vote: I do not like it

What about limits? There could be different approaches depending on number of points etc...

  • »
    »
    12 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    constraints:- Number of input points(N)<= 10^6 ; k <= 10^5 ; Time limit:- 5 sec.

    • »
      »
      »
      12 years ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      Never mind. I understood suddenly that k-nearest means one which is k-th when sorted by distance. I thought instead that we are building something like a chain of k closest points. Quite stupid of me :)

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      We can find distance to each point and sort them. Or we have many queries?

»
12 years ago, # |
  Vote: I like it 0 Vote: I do not like it

If k is small (say, constant), then we can build a higher order Voronoi diagram in time , and the reduce our problem to the planar point location problem. So, we get a data structure with preprocessing time , space O(k(n - k)), and query time .

  • »
    »
    12 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    for k=100, n=10^6, queries = 100, i think Voronoi diagram will be proved costly.

    • »
      »
      »
      12 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Then how about splitting space into sectors (at least rectangular grid)... You then can seek for k-th neighbor only among points of the same sector and few neighboring.

      Surely the hard problem would be to analyse set of point to decide which dissection would split points as uniformly as possible...

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Interesting :-) Do you mean smth like this cd.duke.edu ?

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Exactly! Though I'm not an expert in 2D nearest neighbors, and don't know all the details.

»
12 years ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

k-nearest neighbor of a point in 2-D co-ordinate system.

By the way, which distance between points do you mean? :-)

or |x| + |y| ?

»
12 years ago, # |
Rev. 3   Vote: I like it +3 Vote: I do not like it

Answer to one query is "chain of the K nearest neighbors of point number i", rigth?

You say n, k, q ≤ 105. Size of answer is k × q. It is 1010. Maybe just q × k ≤ 105?

UPD

If constraints really are k × q ≤ 105, KD-tree works good for random sets of points.

About KD trees: wiki, some implementation, my implementation for acm.timus.ru task.1369

  • »
    »
    12 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Burunduk1, i meant the Euclidean Distance between query point and given points. Does your implementation work for quite large constraints, like k*q=10^10?

    Also the timus problem take care of only one nearest point and in this problem we just have to print all points with same distance as a most nearest point has. But here, the problem is to find out k-nearest points including those with same closest distance as well as more so as to complete k points.

    • »
      »
      »
      12 years ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      I also read that it can be solved using k-d trees, which I learn, but I am not getting how to implement the algorithm.

      Start from easier task. Learn, how to find 1-nearest point, using KD-tree. If you already know (implemented it by yourself), how to find 1-nearest point, you should understand, how to find k-nearest points. Just use if (sqr(dx) + sqr(dy) > res[k] + eps) return; instead of if (sqr(dx) + sqr(dy) > res + eps) return; (it's line from my code). Where res[k] -- k-th element in sorted order. You may maintain it using treap.

      Does your implementation work for quite large constraints, like k*q=10^10?

      Hm... of course, it takes some time to process such query :-D If you are about memory, it's O(N). If you are about asymptotics, for random set of points it's about O(K*logN*logK) per query.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hi, do you know any problem in online judges that can be solved using kd tree? It doesn't look like timus 1369 can be solved that way :(

»
9 years ago, # |
Rev. 4   Vote: I like it +5 Vote: I do not like it

You can do this using KD tree. Basically, if you know KD tree nearest neighbor query, to transform it into k-nearest, just store all the points in a priority queue while traversing.

It's O(k log(k) log(n)) on average, O(k log(k) sqrt(n)) on worst. For proofs, look at the cited references in wikipedia.

Short implementation of 2D KNN in C++ using KD tree below, tested.

EDIT: build is O(n log n), not O(2n) EDIT 2: I tested this on Kattis, with k=2. Unfortunately, I can't find online judges with KNN to test with. If you know some problems, I would appreciate a reply.


// 2D point object
struct point {
	double x, y;
	point(double x = 0, double y = 0): x(x), y(y) {}	
};

// the "hyperplane split", use comparators for all dimensions
bool cmpx(const point& a, const point& b) {return a.x < b.x;}
bool cmpy(const point& a, const point& b) {return a.y < b.y;}

struct kdtree {
	point *tree;
	int n;
	// constructor
	kdtree(point p[], int n): tree(new point[n]), n(n) {
		copy(p, p + n, tree);
		build(0, n, false);
	}
	// destructor
	~kdtree() {delete[] tree;}
	// k-nearest neighbor query, O(k log(k) log(n)) on average
	vector<point> query(double x, double y, int k = 1) {
		perform_query(x, y, k, 0, n, false); // recurse
		vector<point> points;
		while (!pq.empty()) { // collect points
			points.push_back(*pq.top().second);
			pq.pop();
		}
		reverse(points.begin(), points.end());
		return points;
	}
private:
	// build is O(n log n) using divide and conquer
	void build(int L, int R, bool dvx) {
		if (L >= R) return;
		int M = (L + R) / 2;
		// get median in O(n), split x-coordinate if dvx is true
		nth_element(tree+L, tree+M, tree+R, dvx?cmpx:cmpy);
		build(L, M, !dvx); build(M+1, R, !dvx);
	}

	// priority queue for KNN, keep the K nearest
	priority_queue<pair<double, point*> > pq;
	void perform_query(double x, double y, int k, int L, int R, bool dvx) {
		if (L >= R) return;
		int M = (L + R) / 2;
		double dx = x - tree[M].x;
		double dy = y - tree[M].y;
		double delta = dvx ? dx : dy;
		double dist = dx * dx + dy * dy;
		// if point is nearer to the kth farthest, put point in queue
		if (pq.size() < k || dist < pq.top().first) {
			pq.push(make_pair(dist, &tree[M]));
			if (pq.size() > k) pq.pop(); // keep k elements only
		}
		int nearL = L, nearR = M, farL = M + 1, farR = R;
		if (delta > 0) { // right is nearer
			swap(nearL, farL);
			swap(nearR, farR);
		}
                // query the nearer child
		perform_query(x, y, k, nearL, nearR, !dvx);

		if (pq.size() < k || delta * delta < pq.top().first)
                        // query the farther child if there might be candidates
			perform_query(x, y, k, farL, farR, !dvx);
	}
};