简体   繁体   English

如何在列表中查找所有出现的元素

[英]How to find all occurrences of an element in a list

index() will give the first occurrence of an item in a list.index()将给出列表中第一次出现的项目。 Is there a neat trick which returns all indices in a list for an element?是否有一个巧妙的技巧可以返回元素列表中的所有索引?

You can use a list comprehension with enumerate :您可以将列表推导与enumerate一起使用:

indices = [i for i, x in enumerate(my_list) if x == "whatever"]

The iterator enumerate(my_list) yields pairs (index, item) for each item in the list.迭代器enumerate(my_list)为列表中的每个项目生成对(index, item) Using i, x as loop variable target unpacks these pairs into the index i and the list item x .使用i, x作为循环变量目标将这些对解包到索引i和列表项x中。 We filter down to all x that match our criterion, and select the indices i of these elements.我们过滤到所有符合我们标准的x ,并且 select 这些元素的索引i

While not a solution for lists directly, numpy really shines for this sort of thing:虽然不是直接用于列表的解决方案, numpy确实适合这类事情:

import numpy as np
values = np.array([1,2,3,1,2,4,5,6,3,2,1])
searchval = 3
ii = np.where(values == searchval)[0]

returns:返回:

ii ==>array([2, 8])

This can be significantly faster for lists (arrays) with a large number of elements vs some of the other solutions.对于具有大量元素的列表(数组),这比其他一些解决方案要快得多。

A solution using list.index :使用list.index的解决方案:

def indices(lst, element):
    result = []
    offset = -1
    while True:
        try:
            offset = lst.index(element, offset+1)
        except ValueError:
            return result
        result.append(offset)

It's much faster than the list comprehension with enumerate , for large lists.对于大型列表,它比使用enumerate的列表理解要快得多。 It is also much slower than the numpy solution if you already have the array, otherwise the cost of converting outweighs the speed gain (tested on integer lists with 100, 1000 and 10000 elements).如果您已经拥有阵列,它也比numpy解决方案慢得多,否则转换成本超过速度增益(在具有 100、1000 和 10000 个元素的 integer 列表上测试)。

NOTE: A note of caution based on Chris_Rands' comment: this solution is faster than the list comprehension if the results are sufficiently sparse, but if the list has many instances of the element that is being searched (more than ~15% of the list, on a test with a list of 1000 integers), the list comprehension is faster.注意:基于 Chris_Rands 评论的注意事项:如果结果足够稀疏,则此解决方案比列表推导更快,但如果列表中有许多正在搜索的元素实例(超过列表的 ~15% ,在包含 1000 个整数的列表的测试中),列表理解更快。

How about:怎么样:

In [1]: l=[1,2,3,4,3,2,5,6,7]

In [2]: [i for i,val in enumerate(l) if val==3]
Out[2]: [2, 4]

more_itertools.locate finds indices for all items that satisfy a condition. more_itertools.locate查找满足条件的所有项目的索引。

from more_itertools import locate


list(locate([0, 1, 1, 0, 1, 0, 0]))
# [1, 2, 4]

list(locate(['a', 'b', 'c', 'b'], lambda x: x == 'b'))
# [1, 3]

more_itertools is a third-party library > pip install more_itertools . more_itertools是第三方库> pip install more_itertools

occurrences = lambda s, lst: (i for i,e in enumerate(lst) if e == s)
list(occurrences(1, [1,2,3,1])) # = [0, 3]

Or Use range (python 3):或使用range (python 3):

l=[i for i in range(len(lst)) if lst[i]=='something...']

For (python 2):对于(蟒蛇2):

l=[i for i in xrange(len(lst)) if lst[i]=='something...']

And then (both cases):然后(两种情况):

print(l)

Is as expected.正如预期的那样。

  • There's an answer using np.where to find the indices of a single value, which is not faster than a list-comprehension, if the time to convert a list to an array is included如果包含将列表转换为数组的时间,则使用np.where查找单个值的索引有一个答案,这并不比列表理解快
  • The overhead of importing numpy and converting a list to a numpy.array probably makes using numpy a less efficient option for most circumstances.导入numpy并将list转换为numpy.array的开销可能使使用numpy在大多数情况下成为效率较低的选项。 A careful timing analysis would be necessary.有必要进行仔细的时序分析。
    • In cases where multiple functions/operations will need to be performed on the list , converting the list to an array , and then using numpy functions will likely be a faster option.如果需要对list执行多个函数/操作,将list转换为array ,然后使用numpy函数可能是更快的选择。
  • This solution uses np.where and np.unique to find the indices of all unique elements in a list.此解决方案使用np.wherenp.unique查找列表中所有唯一元素的索引。
    • Using np.where on an array (including the time to convert the list to an array) is slightly faster than a list-comprehension on a list, for finding all indices of all unique elements .在数组上使用np.where (包括将列表转换为数组的时间)比列表上的列表理解稍快,用于查找所有唯一元素的所有索引
    • This has been tested on an 2M element list with 4 unique values, and the size of the list/array and number of unique elements will have an impact.这已经在具有 4 个唯一值的 2M 元素列表上进行了测试,并且列表/数组的大小和唯一元素的数量会产生影响。
  • Other solutions using numpy on an array can be found in Get a list of all indices of repeated elements in a numpy array可以在获取 numpy 数组中重复元素的所有索引的列表中找到在数组上使用numpy的其他解决方案
import numpy as np
import random  # to create test list

# create sample list
random.seed(365)
l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]

# convert the list to an array for use with these numpy methods
a = np.array(l)

# create a dict of each unique entry and the associated indices
idx = {v: np.where(a == v)[0].tolist() for v in np.unique(a)}

# print(idx)
{'s1': [7, 9, 10, 11, 17],
 's2': [1, 3, 6, 8, 14, 18, 19],
 's3': [0, 2, 13, 16],
 's4': [4, 5, 12, 15]}

%timeit

# create 2M element list
random.seed(365)
l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(2000000)]

Find the indices of one value找到一个值的索引

  • Find indices of a single element in a 2M element list with 4 unique elements在具有 4 个唯一元素的 2M 元素列表中查找单个元素的索引
# np.where: convert list to array
%%timeit
a = np.array(l)
np.where(a == 's1')
[out]:
409 ms ± 41.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# list-comprehension: on list l
%timeit [i for i, x in enumerate(l) if x == "s1"]
[out]:
201 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# filter: on list l
%timeit list(filter(lambda i: l[i]=="s1", range(len(l))))
[out]:
344 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Find the indices of all the values查找所有值的索引

  • Find indices of all unique elements in a 2M element list with 4 unique elements在具有 4 个唯一元素的 2M 元素列表中查找所有唯一元素的索引
# use np.where and np.unique: convert list to array
%%timeit
a = np.array(l)
{v: np.where(a == v)[0].tolist() for v in np.unique(a)}
[out]:
682 ms ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# list comprehension inside dict comprehension: on list l
%timeit {req_word: [idx for idx, word in enumerate(l) if word == req_word] for req_word in set(l)}
[out]:
713 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Getting all the occurrences and the position of one or more (identical) items in a list获取列表中一个或多个(相同)项目的所有事件和 position

With enumerate(alist) you can store the first element (n) that is the index of the list when the element x is equal to what you look for.使用 enumerate(alist) 您可以存储第一个元素 (n),它是当元素 x 等于您要查找的内容时列表的索引。

>>> alist = ['foo', 'spam', 'egg', 'foo']
>>> foo_indexes = [n for n,x in enumerate(alist) if x=='foo']
>>> foo_indexes
[0, 3]
>>>

Let's make our function findindex让我们让我们的 function findindex

This function takes the item and the list as arguments and return the position of the item in the list, like we saw before.这个 function 将项目和列表作为 arguments 并返回列表中项目的 position,就像我们之前看到的那样。

def indexlist(item2find, list_or_string):
  "Returns all indexes of an item in a list or a string"
  return [n for n,item in enumerate(list_or_string) if item==item2find]

print(indexlist("1", "010101010"))

Output Output


[1, 3, 5, 7]

Simple简单的

for n, i in enumerate([1, 2, 3, 4, 1]):
    if i == 1:
        print(n)

Output: Output:

0
4

Using filter() in python2.在 python2 中使用 filter()。

>>> q = ['Yeehaw', 'Yeehaw', 'Googol', 'B9', 'Googol', 'NSM', 'B9', 'NSM', 'Dont Ask', 'Googol']
>>> filter(lambda i: q[i]=="Googol", range(len(q)))
[2, 4, 9]

One more solution(sorry if duplicates) for all occurrences:所有出现的另一种解决方案(对不起,如果重复):

values = [1,2,3,1,2,4,5,6,3,2,1]
map(lambda val: (val, [i for i in xrange(len(values)) if values[i] == val]), values)

Using a for-loop :使用for-loop

  • Answers with enumerate and a list comprehension are more pythonic, but not necessarily faster.带有enumerate列表理解的答案更符合 Python 风格,但不一定更快。 However, this answer is aimed at students who may not be allowed to use some of those built-in functions .但是,此答案针对可能不允许使用其中某些内置功能的学生。
  • create an empty list, indices创建一个空列表, indices
  • create the loop with for i in range(len(x)): , which essentially iterates through a list of index locations [0, 1, 2, 3, ..., len(x)-1]使用for i in range(len(x)):创建循环,它本质上是遍历索引位置列表[0, 1, 2, 3, ..., len(x)-1]
  • in the loop, add any i , where x[i] is a match to value , to indices在循环中,将任何i (其中x[i]value匹配)添加到indices
def get_indices(x: list, value: int) -> list:
    indices = list()
    for i in range(len(x)):
        if x[i] == value:
            indices.append(i)
    return indices

n = [1, 2, 3, -50, -60, 0, 6, 9, -60, -60]
print(get_indices(n, -60))

>>> [4, 8, 9]
  • The functions, get_indices , are implemented with type hints .函数get_indices是使用类型提示实现的。 In this case, the list, n , is a bunch of int s, therefore we search for value , also defined as an int .在这种情况下,列表n是一堆int ,因此我们搜索value ,也定义为int

Using a while-loop and .index :使用while-loop.index

  • With .index , use try-except for error handling , because a ValueError will occur if value is not in the list .对于.index ,使用try-except进行错误处理,因为如果value不在list中,则会发生ValueError
def get_indices(x: list, value: int) -> list:
    indices = list()
    i = 0
    while True:
        try:
            # find an occurrence of value and update i to that index
            i = x.index(value, i)
            # add i to the list
            indices.append(i)
            # advance i by 1
            i += 1
        except ValueError as e:
            break
    return indices

print(get_indices(n, -60))
>>> [4, 8, 9]

A dynamic list comprehension based solution incase we do not know in advance which element:一个基于动态列表理解的解决方案,以防我们事先不知道哪个元素:

lst = ['to', 'be', 'or', 'not', 'to', 'be']
{req_word: [idx for idx, word in enumerate(lst) if word == req_word] for req_word in set(lst)}

results in:结果是:

{'be': [1, 5], 'or': [2], 'to': [0, 4], 'not': [3]}

You can think of all other ways along the same lines as well but with index() you can find only one index although you can set occurrence number yourself.您也可以按照相同的思路考虑所有其他方法,但是使用index()您只能找到一个索引,尽管您可以自己设置出现次数。

If you need to search for all element's positions between certain indices , you can state them:如果您需要在某些索引之间搜索所有元素的位置,您可以 state 它们:

[i for i,x in enumerate([1,2,3,2]) if x==2 & 2<= i <=3] # -> [3]

You can create a defaultdict您可以创建一个默认字典

from collections import defaultdict
d1 = defaultdict(int)      # defaults to 0 values for keys
unq = set(lst1)              # lst1 = [1, 2, 2, 3, 4, 1, 2, 7]
for each in unq:
      d1[each] = lst1.count(each)
else:
      print(d1)

If you are using Python 2, you can achieve the same functionality with this:如果您使用的是 Python 2,您可以通过以下方式实现相同的功能:

f = lambda my_list, value:filter(lambda x: my_list[x] == value, range(len(my_list)))

Where my_list is the list you want to get the indexes of, and value is the value searched.其中my_list是您要获取其索引的列表,而value是搜索的值。 Usage:用法:

f(some_list, some_element)

Create a generator创建一个生成器

Generators are fast and use a tiny memory footprint.生成器速度很快,并且使用微小的 memory 封装。 They give you flexibility in how you use the result.它们使您可以灵活地使用结果。

def indices(iter, val):
    """Generator: Returns all indices of val in iter
    Raises a ValueError if no val does not occur in iter
    Passes on the AttributeError if iter does not have an index method (e.g. is a set)
    """
    i = -1
    NotFound = False
    while not NotFound:
        try:
            i = iter.index(val, i+1)
        except ValueError:
            NotFound = True
        else:
            yield i
    if i == -1:
        raise ValueError("No occurrences of {v} in {i}".format(v = val, i = iter))

The above code can be use to create a list of the indices: list(indices(input,value)) ;上面的代码可用于创建索引列表: list(indices(input,value)) ; use them as dictionary keys: dict(indices(input,value)) ;将它们用作字典键: dict(indices(input,value)) sum them: sum(indices(input,value)) ;将它们相加: sum(indices(input,value)) ; in a for loop for index_ in indices(input,value): ;for index_ in indices(input,value): ...etc... without creating an interim list/tuple or similar. ...等等...而不创建临时列表/元组或类似的。

In a for loop you will get your next index back when you call for it, without waiting for all the others to be calculated first.在 for 循环中,当您调用它时,您将获得下一个索引,而无需等待首先计算所有其他索引。 That means: if you break out of the loop for some reason you save the time needed to find indices you never needed.这意味着:如果您出于某种原因跳出循环,则可以节省查找您从不需要的索引所需的时间。

How it works这个怎么运作

  • Call .index on the input iter to find the next occurrence of val在输入iter上调用.index以查找下一次出现的val
  • Use the second parameter to .index to start at the point after the last found occurrence使用.index的第二个参数从最后发现的事件之后的点开始
  • Yield the index产生指数
  • Repeat until index raises a ValueError重复直到index引发ValueError

Alternative versions替代版本

I tried four different versions for flow control;我尝试了四种不同的流量控制版本; two EAFP (using try - except ) and two TBYL (with a logical test in the while statement):两个 EAFP(使用try - except )和两个 TBYL(在while语句中进行逻辑测试):

  1. "WhileTrueBreak": while True: ... except ValueError: break . "WhileTrueBreak": while True: ... except ValueError: break Surprisingly, this was usually a touch slower than option 2 and (IMV) less readable令人惊讶的是,这通常比选项 2 慢一点,并且 (IMV) 可读性较差
  2. "WhileErrFalse": Using a bool variable err to identify when a ValueError is raised. “WhileErrFalse”:使用布尔变量err来识别何时引发ValueError This is generally the fastest and more readable than 1这通常是比 1 最快且更具可读性
  3. "RemainingSlice": Check whether val is in the remaining part of the input using slicing: while val in iter[i:] . “RemainingSlice”:使用切片检查 val 是否在输入的剩余部分中: while val in iter[i:] Unsurprisingly, this does not scale well不出所料,这不能很好地扩展
  4. "LastOccurrence": Check first where the last occurrence is, keep going while i < last “LastOccurrence”:首先检查最后一次出现的位置, while i < last继续

The overall performance differences between 1,2 and 4 are negligible, so it comes down to personal style and preference. 1,2 和 4 之间的整体性能差异可以忽略不计,因此归结为个人风格和偏好。 Given that .index uses ValueError to let you know it didn't find anything, rather than eg returning None , an EAFP-approach seems fitting to me.鉴于.index使用ValueError让您知道它没有找到任何东西,而不是例如返回None ,EAFP 方法似乎适合我。

Here are the 4 code variants and results from timeit (in milliseconds) for different lengths of input and sparsity of matches以下是timeit的 4 个代码变体和结果(以毫秒为单位),用于不同的输入长度和匹配的稀疏性

@version("WhileTrueBreak", versions)
def indices2(iter, val):
    i = -1
    while True:
        try:
            i = iter.index(val, i+1)
        except ValueError:
            break
        else:
            yield i

@version("WhileErrFalse", versions)
def indices5(iter, val):
    i = -1
    err = False
    while not err:
        try:
            i = iter.index(val, i+1)
        except ValueError:
            err = True
        else:
            yield i

@version("RemainingSlice", versions)
def indices1(iter, val):
    i = 0
    while val in iter[i:]:
        i = iter.index(val, i)
        yield i
        i += 1

@version("LastOccurrence", versions)
def indices4(iter,val):
    i = 0
    last = len(iter) - tuple(reversed(iter)).index(val)
    while i < last:
        i = iter.index(val, i)
        yield i
        i += 1
Length: 100, Ocurrences: 4.0%
{'WhileTrueBreak': 0.0074799987487494946, 'WhileErrFalse': 0.006440002471208572, 'RemainingSlice': 0.01221001148223877, 'LastOccurrence': 0.00801000278443098}
Length: 1000, Ocurrences: 1.2%
{'WhileTrueBreak': 0.03101000329479575, 'WhileErrFalse': 0.0278000021353364, 'RemainingSlice': 0.08278000168502331, 'LastOccurrence': 0.03986000083386898}
Length: 10000, Ocurrences: 2.05%
{'WhileTrueBreak': 0.18062000162899494, 'WhileErrFalse': 0.1810499932616949, 'RemainingSlice': 2.9145700042136014, 'LastOccurrence': 0.2049500006251037}
Length: 100000, Ocurrences: 1.977%
{'WhileTrueBreak': 1.9361200043931603, 'WhileErrFalse': 1.7280600033700466, 'RemainingSlice': 254.4725100044161, 'LastOccurrence': 1.9101499929092824}
Length: 100000, Ocurrences: 9.873%
{'WhileTrueBreak': 2.832529996521771, 'WhileErrFalse': 2.9984100023284554, 'RemainingSlice': 1132.4922299943864, 'LastOccurrence': 2.6660699979402125}
Length: 100000, Ocurrences: 25.058%
{'WhileTrueBreak': 5.119729996658862, 'WhileErrFalse': 5.2082200068980455, 'RemainingSlice': 2443.0577100021765, 'LastOccurrence': 4.75954000139609}
Length: 100000, Ocurrences: 49.698%
{'WhileTrueBreak': 9.372120001353323, 'WhileErrFalse': 8.447749994229525, 'RemainingSlice': 5042.717969999649, 'LastOccurrence': 8.050809998530895}

Here is a time performance comparison between using np.where vs list_comprehension .这是使用np.wherelist_comprehension之间的时间性能比较。 Seems like np.where is faster on average.似乎np.where更快。

# np.where
start_times = []
end_times = []
for i in range(10000):
    start = time.time()
    start_times.append(start)
    temp_list = np.array([1,2,3,3,5])
    ixs = np.where(temp_list==3)[0].tolist()
    end = time.time()
    end_times.append(end)
print("Took on average {} seconds".format(
    np.mean(end_times)-np.mean(start_times)))
Took on average 3.81469726562e-06 seconds
# list_comprehension
start_times = []
end_times = []
for i in range(10000):
    start = time.time()
    start_times.append(start)
    temp_list = np.array([1,2,3,3,5])
    ixs = [i for i in range(len(temp_list)) if temp_list[i]==3]
    end = time.time()
    end_times.append(end)
print("Took on average {} seconds".format(
    np.mean(end_times)-np.mean(start_times)))
Took on average 4.05311584473e-06 seconds

Edit (Idiotness):编辑(白痴):

Adding to the good answers.添加到好的答案。

def count(x, lst):
    ind = []
    for i in lst:
        if i == x:
            ind.append(lst.index(x))
    return ind

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM