Блог пользователя ologn_13

Автор ologn_13, 12 лет назад, По-английски

Hi all!! The problem is related to searching the k-nearest neighbor of a point in 2-D co-ordinate system. Inputs are given in form of (x,y) coordinates and we have to find out k-nearest neighbor of a point. I search thoroughly on internet, but I got only pure theoretic algorithms in term of searching points in a hypothetical sphere. I also read that it can be solved using k-d trees, which I learn, but I am not getting how to implement the algorithm. Please help, if anyone have implemented it before or had a look at its implementation earlier. Thanks!

  • Проголосовать: нравится
  • +6
  • Проголосовать: не нравится

»
12 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

What about limits? There could be different approaches depending on number of points etc...

  • »
    »
    12 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    constraints:- Number of input points(N)<= 10^6 ; k <= 10^5 ; Time limit:- 5 sec.

    • »
      »
      »
      12 лет назад, # ^ |
      Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

      Never mind. I understood suddenly that k-nearest means one which is k-th when sorted by distance. I thought instead that we are building something like a chain of k closest points. Quite stupid of me :)

    • »
      »
      »
      12 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      We can find distance to each point and sort them. Or we have many queries?

»
12 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

If k is small (say, constant), then we can build a higher order Voronoi diagram in time , and the reduce our problem to the planar point location problem. So, we get a data structure with preprocessing time , space O(k(n - k)), and query time .

  • »
    »
    12 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    for k=100, n=10^6, queries = 100, i think Voronoi diagram will be proved costly.

    • »
      »
      »
      12 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      Then how about splitting space into sectors (at least rectangular grid)... You then can seek for k-th neighbor only among points of the same sector and few neighboring.

      Surely the hard problem would be to analyse set of point to decide which dissection would split points as uniformly as possible...

  • »
    »
    12 лет назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    Interesting :-) Do you mean smth like this cd.duke.edu ?

    • »
      »
      »
      12 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Exactly! Though I'm not an expert in 2D nearest neighbors, and don't know all the details.

»
12 лет назад, # |
Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

k-nearest neighbor of a point in 2-D co-ordinate system.

By the way, which distance between points do you mean? :-)

or |x| + |y| ?

»
12 лет назад, # |
Rev. 3   Проголосовать: нравится +3 Проголосовать: не нравится

Answer to one query is "chain of the K nearest neighbors of point number i", rigth?

You say n, k, q ≤ 105. Size of answer is k × q. It is 1010. Maybe just q × k ≤ 105?

UPD

If constraints really are k × q ≤ 105, KD-tree works good for random sets of points.

About KD trees: wiki, some implementation, my implementation for acm.timus.ru task.1369

  • »
    »
    12 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    Burunduk1, i meant the Euclidean Distance between query point and given points. Does your implementation work for quite large constraints, like k*q=10^10?

    Also the timus problem take care of only one nearest point and in this problem we just have to print all points with same distance as a most nearest point has. But here, the problem is to find out k-nearest points including those with same closest distance as well as more so as to complete k points.

    • »
      »
      »
      12 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

      I also read that it can be solved using k-d trees, which I learn, but I am not getting how to implement the algorithm.

      Start from easier task. Learn, how to find 1-nearest point, using KD-tree. If you already know (implemented it by yourself), how to find 1-nearest point, you should understand, how to find k-nearest points. Just use if (sqr(dx) + sqr(dy) > res[k] + eps) return; instead of if (sqr(dx) + sqr(dy) > res + eps) return; (it's line from my code). Where res[k] -- k-th element in sorted order. You may maintain it using treap.

      Does your implementation work for quite large constraints, like k*q=10^10?

      Hm... of course, it takes some time to process such query :-D If you are about memory, it's O(N). If you are about asymptotics, for random set of points it's about O(K*logN*logK) per query.

  • »
    »
    12 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Hi, do you know any problem in online judges that can be solved using kd tree? It doesn't look like timus 1369 can be solved that way :(

»
9 лет назад, # |
Rev. 4   Проголосовать: нравится +5 Проголосовать: не нравится

You can do this using KD tree. Basically, if you know KD tree nearest neighbor query, to transform it into k-nearest, just store all the points in a priority queue while traversing.

It's O(k log(k) log(n)) on average, O(k log(k) sqrt(n)) on worst. For proofs, look at the cited references in wikipedia.

Short implementation of 2D KNN in C++ using KD tree below, tested.

EDIT: build is O(n log n), not O(2n) EDIT 2: I tested this on Kattis, with k=2. Unfortunately, I can't find online judges with KNN to test with. If you know some problems, I would appreciate a reply.


// 2D point object
struct point {
	double x, y;
	point(double x = 0, double y = 0): x(x), y(y) {}	
};

// the "hyperplane split", use comparators for all dimensions
bool cmpx(const point& a, const point& b) {return a.x < b.x;}
bool cmpy(const point& a, const point& b) {return a.y < b.y;}

struct kdtree {
	point *tree;
	int n;
	// constructor
	kdtree(point p[], int n): tree(new point[n]), n(n) {
		copy(p, p + n, tree);
		build(0, n, false);
	}
	// destructor
	~kdtree() {delete[] tree;}
	// k-nearest neighbor query, O(k log(k) log(n)) on average
	vector<point> query(double x, double y, int k = 1) {
		perform_query(x, y, k, 0, n, false); // recurse
		vector<point> points;
		while (!pq.empty()) { // collect points
			points.push_back(*pq.top().second);
			pq.pop();
		}
		reverse(points.begin(), points.end());
		return points;
	}
private:
	// build is O(n log n) using divide and conquer
	void build(int L, int R, bool dvx) {
		if (L >= R) return;
		int M = (L + R) / 2;
		// get median in O(n), split x-coordinate if dvx is true
		nth_element(tree+L, tree+M, tree+R, dvx?cmpx:cmpy);
		build(L, M, !dvx); build(M+1, R, !dvx);
	}

	// priority queue for KNN, keep the K nearest
	priority_queue<pair<double, point*> > pq;
	void perform_query(double x, double y, int k, int L, int R, bool dvx) {
		if (L >= R) return;
		int M = (L + R) / 2;
		double dx = x - tree[M].x;
		double dy = y - tree[M].y;
		double delta = dvx ? dx : dy;
		double dist = dx * dx + dy * dy;
		// if point is nearer to the kth farthest, put point in queue
		if (pq.size() < k || dist < pq.top().first) {
			pq.push(make_pair(dist, &tree[M]));
			if (pq.size() > k) pq.pop(); // keep k elements only
		}
		int nearL = L, nearR = M, farL = M + 1, farR = R;
		if (delta > 0) { // right is nearer
			swap(nearL, farL);
			swap(nearR, farR);
		}
                // query the nearer child
		perform_query(x, y, k, nearL, nearR, !dvx);

		if (pq.size() < k || delta * delta < pq.top().first)
                        // query the farther child if there might be candidates
			perform_query(x, y, k, farL, farR, !dvx);
	}
};