在 python 的列表中查找项目的最快方法是什么？

Question

对于我的项目，我需要在列表中反复查找时间戳的索引，如果确切的时间戳不在列表中，我需要在我正在寻找的时间戳之前找到时间戳的索引。 我尝试遍历列表，但这很慢：

def find_item_index(arr, x):
    '''
    returns index of x in ordered list.
    If x is between two items in the list, the index of the lower one is returned.
    '''

    for index in range(len(arr)):
        if arr[index] <= x < arr[index+1]:
            return index

    raise ValueError(f'{x} not in array.')

我也尝试以递归方式执行此操作，但速度更慢：

def find_item_index_recursive(arr, x, index = 0):
    '''
    returns index of x in ordered list.
    If x is between two items in the list, the index of the lower one is returned.
    '''

    length = len(arr)

    if length == 1:
        return index

    if arr[length // 2] < x:
        return find_item_index_recursive(arr[length // 2:], x, index + length // 2)
    else:
        return find_item_index_recursive(arr[:length // 2], x, index)

    raise ValueError(f'{x} not in array.')

有没有更快的方法来做到这一点？

Answer 1

对列表进行排序并跟踪它是否已排序，然后再对它进行任何工作

if not arr_is_sorted:     # create me somewhere!
    arr.sort()            # inplace sort
    arr_is_sorted = True  # unset if you're unsure if the array is sorted

使用排序列表，您可以通过二进制搜索有效地O(log n)找到插入点 - 有一个方便的内置库bisect ！

import bisect
insertion_point = bisect.bisect_left(arr, x)

这也使数组保持排序，因此您不需要重新排序，除非您对其进行不相关的更改（理想情况下，您永远不会进行无序插入，因此它将始终被排序）

这是如何使用 bisect 的完整示例

>>> l = [100,50,200,99]
>>> l.sort()
>>> l
[50, 99, 100, 200]
>>> import bisect
>>> bisect.bisect_left(l, 55)
1
>>> bisect.bisect_left(l, 201)
4

您可以使用arr.insert(position, value)将值放入列表

>>> l
[50, 99, 100, 200]
>>> value = 55
>>> l.insert(bisect.bisect_left(l, value), value)
>>> l
[50, 55, 99, 100, 200]

您可以通过检查 position 是否已经相等来防止重复插入

>>> pos = bisect.bisect_left(l, value)
>>> if pos == len(l) or l[pos] != value:  # length check avoids IndexError
...     l.insert(pos, value)

Answer 2

List 有一个内置方法，可以为您提供元素的索引。 如果未找到该元素，则会引发值错误。

try:
    index = list1.index(element_to_search)
except ValueError as e:
    print('element not found')

Answer 3

我认为这应该可以快速工作：（我假设您的时间戳已排序？）

def find_item_index(arr, x):
    '''
    returns index of x in ordered list.
    If x is between two items in the list, the index of the lower one is returned.
    '''
    
    l = len(arr)
    i = l//2
    j = i//2
    
    while(j>0):
        if x<arr[i]:
            i-= j
        else:
            i+= j
        j = j//2
    return i

编辑：我刚刚检查过。 与您的第一个版本相比，更长的列表更快。我预计至少 4 倍，如果列表变得更长甚至 10 倍

Answer 4

Numpy searchsorted通常涉及这些情况：

np.searchsorted([1,2,8,9], 5) # Your case
> 2

np.searchsorted([1,2,8,9], (-1, 2, 100))  #Other cases
> array([0, 1, 4])

缺失案例中的索引指的是近右。 如果这不是您的情况，则可以对其进行修改以获得左近的 position。

在 python 的列表中查找项目的最快方法是什么？

问题描述

4 个解决方案

解决方案1
3 2021-04-23 15:54:02

解决方案2
1 2021-04-23 15:12:51

解决方案3
1 2021-04-23 15:39:50

解决方案4
1 已采纳 2021-04-23 16:22:00

在 python 的列表中查找项目的最快方法是什么？

问题描述

4 个解决方案

解决方案1 3 2021-04-23 15:54:02

解决方案2 1 2021-04-23 15:12:51

解决方案3 1 2021-04-23 15:39:50

解决方案4 1 已采纳 2021-04-23 16:22:00

解决方案1
3 2021-04-23 15:54:02

解决方案2
1 2021-04-23 15:12:51

解决方案3
1 2021-04-23 15:39:50

解决方案4
1 已采纳 2021-04-23 16:22:00