简体   繁体   English

查找最接近列表中未完全排序的值的项目的索引

[英]finding index of an item closest to the value in a list that's not entirely sorted

As an example my list is: 作为一个例子,我的列表是:

[25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]

and I'm looking for the index of the value closest to 11.5 . 我正在寻找最接近11.5的值的索引。 I've tried other methods such as binary search and bisect_left but they don't work. 我尝试过其他方法,如二进制搜索和bisect_left但它们不起作用。

I cannot sort this array, because the index of the value will be used on a similar array to fetch the value at that index. 我无法对此数组进行排序,因为该值的索引将用于类似的数组以获取该索引处的值。

Try the following: 请尝试以下方法:

min(range(len(a)), key=lambda i: abs(a[i]-11.5))

For example: 例如:

>>> a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]
>>> min(range(len(a)), key=lambda i: abs(a[i]-11.5))
16

Or to get the index and the value: 或者获取索引和值:

>>> min(enumerate(a), key=lambda x: abs(x[1]-11.5))
(16, 11.33447)
import numpy as np

a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]

index = np.argmin(np.abs(np.array(a)-11.5))
a[index] # here is your result

In case a is already an array, the corresponding transformation can be ommitted. 如果a已经是数组,则可以省略相应的转换。

怎么样:你压缩两个列表,然后对结果进行排序?

If you can't sort the array, then there is no quick way to find the closest item - you have to iterate over all entries. 如果您无法对数组进行排序,则无法快速找到最近的项目 - 您必须遍历所有条目。

There is a workaround but it's quite a bit of work: Write a sort algorithm which sorts the array and (at the same time) updates a second array which tells you where this entry was before the array was sorted. 有一个解决方法,但它有相当多的工作:编写一个排序算法,对数组进行排序,并(同时)更新第二个数组,告诉你在数组排序之前这个条目的位置。

That way, you can use binary search to look up index of the closest entry and then use this index to look up the original index using the "index array". 这样,您可以使用二进制搜索来查找最近条目的索引,然后使用此索引使用“索引数组”查找原始索引。

[EDIT] Using zip() , this is pretty simple to achieve: [编辑]使用zip() ,这很容易实现:

 array_to_sort = zip( original_array, range(len(original_array)) )
 array_to_sort.sort( key=i:i[0] )

Now you can binary search for the value (using item[0] ). 现在您可以二进制搜索该值(使用item[0] )。 item[1] will give you the original index. item[1]会给你原始索引。

Going trough all the items is only linear. 通过所有项目只是线性的。 If you would sort the array that would be worse. 如果你要对阵列进行排序会更糟糕。

I dont see a problem on keeping an additional deltax (the min difference so far) and idx (the index of that element) and just loop once trough the list. 我没有看到保持额外的deltax (到目前为止的最小差异)和idx (该元素的索引)的问题,只是循环一次通过列表。

Keep in mind that if space isn't important you can sort any list without moving the contents by creating a secondary list of the sorted indices. 请记住,如果空间不重要,您可以通过创建排序索引的辅助列表来对任何列表进行排序,而无需移动内容。

Also bear in mind that if you are doing this look up just once, then you will just have to traverse every element in the list O(n). 还要记住,如果你只是查看一次,那么你只需要遍历列表O(n)中的每个元素。 (If multiple times then you probably would want to sort for increase efficiency later) (如果多次,那么您可能希望以后进行排序以提高效率)

If you are searching a long list a lot of times, then min scales very bad (O(n^2), if you append some of your searches to the search list, I think). 如果你经常搜索一个很长的列表,那么min刻度非常差(O(n ^ 2),如果你将一些搜索附加到搜索列表中,我认为)。

Bisect is your friend. Bisect是你的朋友。 Here's my solution. 这是我的解决方案。 It scales O(n*log(n)): 它缩放O(n * log(n)):

class Closest:
    """Assumes *no* redundant entries - all inputs must be unique"""
    def __init__(self, numlist=None, firstdistance=0):
        if numlist == None:
            numlist=[]
        self.numindexes = dict((val, n) for n, val in enumerate(numlist))
        self.nums = sorted(self.numindexes)
        self.firstdistance = firstdistance

    def append(self, num):
        if num in self.numindexes:
            raise ValueError("Cannot append '%s' it is already used" % str(num))
        self.numindexes[num] = len(self.nums)
        bisect.insort(self.nums, num)

    def rank(self, target):
        rank = bisect.bisect(self.nums, target)
        if rank == 0:
            pass
        elif len(self.nums) == rank:
            rank -= 1
        else:
            dist1 = target - self.nums[rank - 1]
            dist2 = self.nums[rank] - target
            if dist1 < dist2:
                rank -= 1
        return rank

    def closest(self, target):
        try:
            return self.numindexes[self.nums[self.rank(target)]]
        except IndexError:
            return 0

    def distance(self, target):
        rank = self.rank(target)
        try:
            dist = abs(self.nums[rank] - target)
        except IndexError:
            dist = self.firstdistance
        return dist

Use it like this: 像这样使用它:

a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866,
     19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154,
     13.09409, 12.18347, 1.33447, 10.32184, 9.544922, 8.813385, 8.181152,
     6.983734, 6.048035, 5.505096, 4.65799]
targets = [1.0, 100.0, 15.0, 15.6, 8.0]
cl = Closest(a)
for x in targets:
    rank = cl.rank(x)
    print("Closest to %5.1f : rank=%2i num=%8.5f index=%2i " % (x, rank,
        cl.nums[rank], cl.closest(x)))

Will output: 将输出:

Closest to   1.0 : rank= 0 num= 1.33447 index=16
Closest to 100.0 : rank=25 num=26.78030 index= 1
Closest to  15.0 : rank=12 num=14.79059 index=12
Closest to  15.6 : rank=13 num=15.71255 index=11
Closest to   8.0 : rank= 5 num= 8.18115 index=20

And: 和:

cl.append(99.9)
x = 100.0
rank = cl.rank(x)
print("Closest to %5.1f : rank=%2i num=%8.5f index=%2i " % (x, rank,
    cl.nums[rank], cl.closest(x)))

Output: 输出:

Closest to 100.0 : rank=25 num=99.90000 index=25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM