简体   繁体   中英

Intersect a set with a list of sets in python

I have a set s and a list of set l as below.

s = {1,2,3,4}
l = [{1}, {1,2,3}, {3}]

The output should be

out = [{1}, {1,2,3}, {3}]

I am using the following code to accomplish it. But I was hoping there would be a faster way? Perhaps some sort of broadcasting?

out = [i.intersection(s) for i in l]

EDIT

List l can be as long as 1000 elements long.

My end objective is to create a matrix which has the length of elements of the pairwise intersection of elements of l . So s is an element of l .

out_matrix = list()
for s in l:
    out_matrix.append([len(i.intersection(s)) for i in l])

My first thought when reading this question was "sure, use numpy ". Then I decided to do some tests:

import numpy as np
from timeit import Timer

s = {1, 2, 3, 4}
l = [{1}, {1, 2, 3}, {3}] * 1000  # 3000 elements
arr = np.array(l)


def list_comp():
    [i.intersection(s) for i in l]


def numpy_arr():
    arr & s

print(min(Timer(list_comp).repeat(500, 500)))
print(min(Timer(numpy_arr).repeat(500, 500)))

This outputs

# 0.05513364499999995
# 0.035647999999999236

So numpy is indeed a bit faster. Does it really worth it? not sure. A ~0.02 seconds difference for a 3000 elements list is neglectable (especially if considering the fact that my test didn't even take into account the time it took to create arr ).

Keep in mind that even when using numpy we are still in the grounds of O(n). The difference is due to the fact that numpy pushes the for loop down to the C level, which is inherently faster than a Python for loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM