简体   繁体   中英

How to get intersection of each element of two nested lists in python?

I'm working in pyspark environment and if I have two nested lists a and b,

a=[[1,2,3],[8,9,45,0,65],[3,7,23,88],[44,77,99,100,654]]
b=[[1,3,7],[0,9,67,22,45,8,11],[23,3],[100]]

and I want the intersection of these two in python

intersection_list=[[1,3],[8,9,45,0],[3,23],[100]]

and the final count of this be

list_count=[2,3,2,1]

how to get this results in pyspark?

I have tried

[[[n for n in a if n in b]for x in a]for y in b]

but this did't gave me required intersection_list

Is there any way to do this with rdd also in pypark?

[[n for n in x if n in y] for x, y in zip(a, b)]

However if the sublists are big this would be better:

[set(x).intersection(y) for x, y in zip(a, b)]

(although the order of elements is lost)

a=[[1,2,3],[8,9,45,0,65],[3,7,23,88],[44,77,99,100,654]]
b=[[1,3,7],[0,9,67,22,45,8,11],[23,3],[100]]

intersection_list = [list(set(x) & set(y)) for x, y in zip(a,b)]

>> [[1, 3], [8, 9, 45, 0], [3, 23], [100]]

list_count = [ len(x) for x in intersection_list ]

>> [2, 4, 2, 1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM