I have following lists:
list_1 = [(0, 1, 7, 6), (1, 2, 8, 7), (2, 3, 9, 8), ...]
list_2 = [(0,1), (1,7), (7,8), (3,9), ...]
Both lists have a length of 200000 or more elements.
I need a fast algorithm to check how often an element of list_2
occurs in an element of list_1
. In the example above, the second element of list_2
which is (1,7)
occurs two times in list_1
, respectively in the first and second list element.
In my case it is a valid hit, if both numbers are a subset of list_1
independent of their order. So I thought I go with sets and use .issubset
.
for item1 in list_1:
count = 0
for item2 in list_2:
if set(item2).issubset(set(item1)):
count += count
if count == 1:
do this
if count == 2:
do that
The data from the lists are structured in a way, that I know upfront, that the variable count
can only have the values 1
or 2
. And I know that a loop of O(N**2) is not smart at all and that the if
statements in it do not boost performance. Actually, in my current implementation the elements of list_2
are already of type set
, but the snippet above is shorter and easy to read.
I believe that there are smart solutions existing for this task.
My application uses numpy and scipy, so any KD-tree search or similar (if applicable) would also be fine.
EDIT
I need to be more specific:
list_2
always contains pairs. list_1
can have 3 or more items per list element. do this
and do that
I need to keep track of the corresponding elements and their associations, eg by using a dictionary. You could first do some preprocessing and build a dictionary keyed by the individual numbers that occur in list_1 providing each key as value the set of tuples in list_1 that have that keys.
Then finding the occurrences of a pair from list_2 is as simple as taking the intersection of the sets found at the two keys, and taking the resulting set's size.
list_1 = [(0, 1, 7, 6), (1, 2, 8, 7), (2, 3, 9, 8)]
list_2 = [(0,1), (1,7), (7,8), (3,9)]
# per number as dictionary key, list the tuples from list_1 that contain it
d = dict()
for lst in list_1:
for v in lst:
if not v in d: d[v] = set()
d[v].add(lst)
# for each pair, take the intersection of the corresponding lists in d
result = [(lst, len(d[lst[0]].intersection(d[lst[1]]))) for lst in list_2]
print(result)
If you need to actually do something with the found tuples from list_1 , then you would first gather those tuples without taking their number (so d[lst[0]].intersection(d[lst[1]])
), and do your processing on them based on what len()
provides (1 or 2).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.