简体   繁体   English

python中的列表匹配:获取更大列表中的子列表的索引

[英]list match in python: get indices of a sub-list in a larger list

For two lists, 对于两个列表,

a = [1, 2, 9, 3, 8, ...]   (no duplicate values in a, but a is very big)
b = [1, 9, 1,...]          (set(b) is a subset of set(a), 1<<len(b)<<len(a)) 

indices = get_indices_of_a(a, b)

how to let get_indices_of_a return indices = [0, 2, 0,...] with array(a)[indices] = b ? 如何让get_indices_of_a返回indices = [0, 2, 0,...] get_indices_of_a indices = [0, 2, 0,...]array(a)[indices] = b Is there a faster method than using a.index , which is taking too long? 有没有比使用a.index更快的方法,这需要太长时间?

Making b a set is a fast method of matching lists and returning indices (see compare two lists in python and return indices of matched values ), but it will lose the index of the second 1 as well as the sequence of the indices in this case. 使b成为一个匹配列表和返回索引的快速方法(请参阅比较python中的两个列表并返回匹配值的索引 ),但是在这种情况下它会丢失第二个1的索引以及索引的序列。

A fast method (when a is a large list) would be using a dict to map values in a to indices: 一种快速方法(当a是一个大的列表)将是使用字典中的值映射a到索引:

>>> index_dict = dict((value, idx) for idx,value in enumerate(a))
>>> [index_dict[x] for x in b]
[0, 2, 0]

This will take linear time in the average case, compared to using a.index which would take quadratic time. 与使用需要二次时间的a.index相比,这将在平均情况下采用线性时间。

Presuming we are working with smaller lists, this is as easy as: 假设我们正在使用较小的列表,这很简单:

>>> a = [1, 2, 9, 3, 8] 
>>> b = [1, 9, 1] 
>>> [a.index(item) for item in b]
[0, 2, 0]

On larger lists, this will become quite expensive. 在较大的列表中,这将变得非常昂贵。

(If there are duplicates, the first occurrence will always be the one referenced in the resulting list, if not set(b) <= set(a) , you will get a ValueError). (如果有重复项,第一次出现将始终是结果列表中引用的那个,如果not set(b) <= set(a) ,您将得到一个ValueError)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM