简体   繁体   中英

Python : Most efficient way to count elements of long list 1 in long list 2 ? (list comprehension is really slow)

I have many tuples (eg (0, 1) ) in two pretty long lists list_0 and list_1 of size ~40k elements. I need to count the tuples of list_0 also in list_1 .

The following statement with list comprehension takes ~ 1 min and I need to do this multiple times, so I am looking for something more efficient:

len([element for element in list_0 if element in list_1])

What could be more efficient?

To reproduce:

elements_0 = 200*[0]+50*[1]+150*[2]
elements_1 = 100*[1]+150*[2]+150*[1]
df = pd.DataFrame(data=[list(elements_0), list(elements_1)]).T
list_0 = [item for sublist in df.groupby(0)[0].apply(lambda x: list(combinations(x.index, 2))) for item in sublist]
list_1 = [item for sublist in df.groupby(1)[1].apply(lambda x: list(combinations(x.index, 2))) for item in sublist]

print(len([pair for pair in list_0 if pair in list_1])) # Long

It looks like you can use

pd.Series(list_0).isin(list_1).sum()

Output:

22300
CPU times: user 14.8 ms, sys: 20 µs, total: 14.8 ms
Wall time: 14.1 ms

which should give the same answer with:

len([element for element in list_0 if element in list_1])

which gives:

22300
CPU times: user 13.8 s, sys: 0 ns, total: 13.8 s
Wall time: 13.8 s

Also merge and query:

s = df.reset_index()
print(len(s.merge(s, on=[0,1])
  .query('index_x > index_y')
))

with output:

22300
CPU times: user 13.4 ms, sys: 15 µs, total: 13.4 ms
Wall time: 12.3 ms

Make list_1 a set first then find the elements that are in it:

list_1_set = set(list_1)
print(len(element for element in list_0 if element in list_1_set))

Maybe you can try to sort these two lists first. Then you can use two pointers for these two lists. Compare whether the tuples on the two pointers match or not, then update the pointer to next tuple in the list. This should take O(n). Please correct me if there is anything wrong. Thanks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM