简体   繁体   中英

Find minimum distance between points

I have a set of points (x,y).

i need to return two points with minimal distance.

I use this: http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf

but , i dont really understand how the algo is working.

Can explain in more simple how the algo working?

or suggest another idea?

Thank!

Solution for Closest Pair Problem with minimum time complexity O(nlogn) is divide-and-conquer methodology as it mentioned in the document that you have read.

Divide-and-conquer Approach for Closest-Pair Problem

Easiest way to understand this algorithm is reading an implementation of it in a high-level language ( because sometimes understanding the algorithms or pseudo-codes can be harder than understanding the real codes ) like Python:

# closest pairs by divide and conquer
# David Eppstein, UC Irvine, 7 Mar 2002

from __future__ import generators

def closestpair(L):
    def square(x): return x*x
    def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1])

    # Work around ridiculous Python inability to change variables in outer scopes
    # by storing a list "best", where best[0] = smallest sqdist found so far and
    # best[1] = pair of points giving that value of sqdist.  Then best itself is never
    # changed, but its elements best[0] and best[1] can be.
    #
    # We use the pair L[0],L[1] as our initial guess at a small distance.
    best = [sqdist(L[0],L[1]), (L[0],L[1])]

    # check whether pair (p,q) forms a closer pair than one seen already
    def testpair(p,q):
        d = sqdist(p,q)
        if d < best[0]:
            best[0] = d
            best[1] = p,q

    # merge two sorted lists by y-coordinate
    def merge(A,B):
        i = 0
        j = 0
        while i < len(A) or j < len(B):
            if j >= len(B) or (i < len(A) and A[i][1] <= B[j][1]):
                yield A[i]
                i += 1
            else:
                yield B[j]
                j += 1

    # Find closest pair recursively; returns all points sorted by y coordinate
    def recur(L):
        if len(L) < 2:
            return L
        split = len(L)/2
        L = list(merge(recur(L[:split]), recur(L[split:])))

        # Find possible closest pair across split line
        # Note: this is not quite the same as the algorithm described in class, because
        # we use the global minimum distance found so far (best[0]), instead of
        # the best distance found within the recursive calls made by this call to recur().
        for i in range(len(E)):
            for j in range(1,8):
                if i+j < len(E):
                    testpair(E[i],E[i+j])
        return L

    L.sort()
    recur(L)
    return best[1]

closestpair([(0,0),(7,6),(2,20),(12,5),(16,16),(5,8),\
              (19,7),(14,22),(8,19),(7,29),(10,11),(1,13)])
# returns: (7,6),(5,8)

Taken from: https://www.ics.uci.edu/~eppstein/161/python/closestpair.py

Detailed explanation:

First we define an Euclidean distance aka Square distance function to prevent code repetition.

def square(x): return x*x # Define square function
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1]) # Define Euclidean distance function

Then we are taking the first two points as our initial best guess:

best = [sqdist(L[0],L[1]), (L[0],L[1])]

This is a function definition for comparing Euclidean distances of next pair with our current best pair:

def testpair(p,q):
    d = sqdist(p,q)
    if d < best[0]:
        best[0] = d
        best[1] = p,q

def merge(A,B): is just a rewind function for the algorithm to merge two sorted lists that previously divided to half.

def recur(L): function definition is the actual body of the algorithm. So I will explain this function definition in more detail:

    if len(L) < 2:
        return L

with this part, algorithm terminates the recursion if there is only one element/point left in the list of points.

Split the list to half: split = len(L)/2

Create a recursion ( by calling function's itself ) for each half: L = list(merge(recur(L[:split]), recur(L[split:])))

Then lastly this nested loops will test whole pairs in the current half-list with each other:

    for i in range(len(E)):
        for j in range(1,8):
            if i+j < len(E):
                testpair(E[i],E[i+j])

As the result of this, if a better pair is found best pair will be updated.

If the number of points is small, you can use the brute force approach ie: for each point find the closest point among other points and save the minimum distance with the current two indices till now.

If the number of points is large, I think you may find the answer in this thread: Shortest distance between points algorithm

So they solve for the problem in Many dimensions using a divide-and-conquer approach. Binary search or divide-and-conquer is mega fast. Basically, if you can split a dataset into two halves, and keep doing that until you find some info you want, you are doing it as fast as humanly and computerly possible most of the time.

For this question, it means that we divide the data set of points into two sets, S1 and S2.

All the points are numerical, right? So we have to pick some number where to divide the dataset.

So we pick some number m and say it is the median.

So let's take a look at an example:

(14, 2)
(11, 2)
(5, 2)
(15, 2)
(0, 2)

What's the closest pair?

Well, they all have the same Y coordinate, so we can look at Xs only... X shortest distance is 14 to 15, a distance of 1.

How can we figure that out using divide-and-conquer?

We look at the greatest value of X and the smallest value of X and we choose the median as a dividing line to make our two sets.

Our median is 7.5 in this example.

We then make 2 sets

S1: (0, 2) and (5, 2)
S2: (11, 2) and (14, 2) and (15, 2)
Median: 7.5

We must keep track of the median for every split, because that is actually a vital piece of knowledge in this algorithm. They don't show it very clearly on the slides, but knowing the median value (where you split a set to make two sets) is essential to solving this question quickly.

We keep track of a value they call delta in the algorithm. Ugh I don't know why most computer scientists absolutely suck at naming variables, you need to have descriptive names when you code so you don't forget what the f000 you coded 10 years ago, so instead of delta let's call this value our-shortest-twig-from-the-median-so-far

Since we have the median value of 7.5 let's go and see what our-shortest-twig-from-the-median-so-far is for Set1 and Set2, respectively:

Set1 : shortest-twig-from-the-median-so-far 2.5 (5 to m where m is 7.5)

Set 2: shortest-twig-from-the-median-so-far 3.5 (looking at 11 to m )

So I think the key take-away from the algorithm is that this shortest-twig-from-the-median-so-far is something that you're trying to improve upon every time you divide a set.

Since S1 in our case has 2 elements only, we are done with the left set, and we have 3 in the right set, so we continue dividing:

S2 = { (11,2) (14,2) (15,2) }

What do you do? You make a new median, call it S2-median

S2-median is halfway between 15 and 11... or 13, right? My math may be fuzzy, but I think that's right so far.

So let's look at the shortest-twig-so-far-for-our-right-side-with-median-thirteen ...

15 to 13 is... 2
11 to 13 is .... 2
14 to 13 is ... 1 (!!!)

So our m value or shortest-twig-from-the-median-so-far is improved (where we updated our median from before because we're in a new chunk or Set...)

Now that we've found it we know that (14, 2) is one of the points that satisfies the shortest pair equation. You can then check exhaustively against the points in this subset (15, 11, 14) to see which one is the closer one.

Clearly, (15,2) and (14,2) are the winning pair in this case.

Does that make sense? You must keep track of the median when you cut the set, and keep a new median for everytime you cut the set until you have only 2 elements remaining on each side (or in our case 3)

The magic is in the median or shortest-twig-from-the-median-so-far

Thanks for asking this question, I went in not knowing how this algorithm worked but found the right highlighted bullet point on the slide and rolled with it. Do you get it now? I don't know how to explain the median magic other than binary search is f000ing awesome.

I tried like this: According to the recursion method I did divide and conquer but at the stage of the two smallest pairs I fell, could someone help? I don't think it's hard for me to calculate the distance between them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM