[英]How to filter a list based on elements in another list in python
I have a list A of about 62,000 numbers, and another list B of about 370,000. 我有一个大约62,000个数字的列表A,另一个大约370,000个列表B。 I would like to filter B so that it contains only elements from A. I tried something like this:
我想过滤B,使其只包含来自A的元素。我尝试过这样的事情:
A=[0,3,5,73,88,43,2,1]
B=[0,5,10,42,43,56,83,88,892,1089,3165]
C=[item for item in A if item in set(B)]
Which works, but is obviously very slow for such large lists because (I think?) the search continues through the entire B, even when the element has already been found in B. So the script is going through a list of 370,000 elements 62,000 times. 哪个可行,但是对于这么大的列表显然很慢,因为(我认为吗?)搜索会遍历整个B,即使已经在B中找到了该元素也是如此。因此脚本正在遍历370,000个元素的列表62,000次。
The elements in A and B are unique (B contains a list of unique values between 0 and 700,000 and A contains a unique subset of those) so once A[i] is found in B, the search can stop. A和B中的元素是唯一的(B包含0到700,000之间的唯一值的列表,而A包含这些值的唯一子集),因此一旦在B中找到A [i],搜索就可以停止。 The values are also in ascending order, if that means anything.
值也按升序排列(如果有任何意义)。
Is there some way to do this more quickly? 有什么办法可以更快地做到这一点?
这将为A中的每个项目创建一个新的set(B)
。相反,请使用内置的set.intersection
:
C = set(A).intersection(B)
To be really sure what I've done is the fastest possible, I would have done that : 为了确保我所做的是最快的,我会这样做:
A=[0,3,5,73,88,43,2,1]
B=[0,5,10,42,43,56,83,88,892,1089,3165]
B_filter = B.copy()
C = []
for item in A:
if filter in B_filter:
C.append(item)
B_filter.pop(0) # B_filter is a list, and it's in ascending order so always the first
If you don't care about losing your B
list, you can just use B
instead of B_filter
and not declare B_filter
, so you don't have to copy a 370k large list. 如果您不担心丢失
B
列表,则可以使用B
代替B_filter
而不声明B_filter
,因此不必复制370k大列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.