简体   繁体   中英

NumPy: np.lexsort with fuzzy/tolerant comparisons

I have a collection of N points in three dimensions. These are stored as an np.array with a shape of (N,3) . All of the points are distinct with the minimum distance between any two points being ~1e-5 . I am looking for a means of obtaining an order in which to iterate over these points which is both independent of their current order in the np.array and robust to small perturbations of individual components.

The simplest means of satisfying the first requirement is with np.lexsort with

np.lexsort(my_array.T)

however this fails in the robustness department:

In [6]: my_array = np.array([[-0.5, 0, 2**0.5], [0.5, 0, 2**0.5 - 1e-15]])

In [7]: my_array[np.lexsort(my_array.T)]
Out[7]: 
array([[ 0.5       ,  0.        ,  1.41421356],
       [-0.5       ,  0.        ,  1.41421356]])

where we can see that in this instance the ordering is extremely sensitive to perturbations. I am therefore looking for a fuzzy variant of np.lexsort which will move onto the next axis if two values in one axis are within a tolerance of epsilon . (Or any alternative mechanism which will permit me to obtain an ordering.)

As my application has several million of these collections, all of which need ordering, performance is something of a concern (which is why I have not blindly tried to roll my own tolerant np.lexsort without first seeing if there is a better way to do it).

My eventual solution was:

def fuzzysort(arr, idx, dim=0, tol=1e-6):
    # Extract our dimension and argsort
    arrd = arr[dim]
    srtdidx = sorted(idx, key=arrd.__getitem__)

    i, ix = 0, srtdidx[0]
    for j, jx in enumerate(srtdidx[1:], start=1):
        if arrd[jx] - arrd[ix] >= tol:
            if j - i > 1:
                srtdidx[i:j] = fuzzysort(arr, srtdidx[i:j], dim + 1, tol)
            i, ix = j, jx

    if i != j:
        srtdidx[i:] = fuzzysort(arr, srtdidx[i:], dim + 1, tol)

    return srtdidx

I note that this is slightly over-engineered for the problem described above. As with np.lexsort the array must be passed in transposed form. The idx parameter permits one to control what indices are considered (allowing for elements to be crudely masked). Otherwise list(xrange(0, N)) will do.

Performance isn't great. However, this is mostly a consequence of NumPy scalar types behaving badly. Calling tolist() on the array beforehand improves the situation somewhat.

I stumbled in the same problem, only in 2D with a list of x, y coordinates that I needed to sort with a tolerance. I ended up writing this solution based on numpy.lexsort :

def tolerance_sort(array, tolerance):
    array_sorted = np.copy(array[np.lexsort((array[:, 0], array[:, 1]))])
    sort_range = [0]
    for i in range(array.shape[0] - 1):
        if array_sorted[i + 1, 1] - array_sorted[i, 1] <= tolerance:
            sort_range.append(i + 1)
            continue
        else:
            sub_arr = np.take(array_sorted, sort_range, axis=0)
            sub_arr_ord = np.copy(
                sub_arr[np.lexsort((sub_arr[:, 1], sub_arr[:, 0]))])
            array_sorted[slice(sort_range[0], sort_range[-1] +
                               1)] = sub_arr_ord
            sort_range = [i + 1]
    return array_sorted

which sorts this:

array([[ 11.  ,   4.  ],
       [  1.  ,   0.  ],
       [  7.  ,  10.  ],
       [  2.  ,   9.  ],
       [  9.  ,   9.  ],
       [  5.  ,   4.  ],
       [  1.  ,   2.  ],
       [  1.  ,   0.  ],
       [  0.  ,   0.1 ],
       [  2.  ,   0.06]])

into this ( tolerance = 0.1 ):

array([[  0.  ,   0.1 ],
       [  1.  ,   0.  ],
       [  1.  ,   0.  ],
       [  2.  ,   0.06],
       [  1.  ,   2.  ],
       [  5.  ,   4.  ],
       [ 11.  ,   4.  ],
       [  2.  ,   9.  ],
       [  9.  ,   9.  ],
       [  7.  ,  10.  ]])

I didn't have time for generalization, so this only works in 2D and presently you have no control on the order of the sorting (first by the second column and then by the first).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM