Fast algorithm needed for finding tuples from within one list inside tuples of another list. Sets?

Question

I have following lists:

list_1 = [(0, 1, 7, 6), (1, 2, 8, 7), (2, 3, 9, 8), ...]
list_2 = [(0,1), (1,7), (7,8), (3,9), ...]

Both lists have a length of 200000 or more elements.

I need a fast algorithm to check how often an element of list_2 occurs in an element of list_1 . In the example above, the second element of list_2 which is (1,7) occurs two times in list_1 , respectively in the first and second list element.

In my case it is a valid hit, if both numbers are a subset of list_1 independent of their order. So I thought I go with sets and use .issubset .

for item1 in list_1:
    count = 0
    for item2 in list_2:
        if set(item2).issubset(set(item1)):
            count += count
    if count == 1:
        do this
    if count == 2:
        do that

The data from the lists are structured in a way, that I know upfront, that the variable count can only have the values 1 or 2 . And I know that a loop of O(N**2) is not smart at all and that the if statements in it do not boost performance. Actually, in my current implementation the elements of list_2 are already of type set , but the snippet above is shorter and easy to read.

I believe that there are smart solutions existing for this task.

My application uses numpy and scipy, so any KD-tree search or similar (if applicable) would also be fine.

EDIT

I need to be more specific:

list_2 always contains pairs. list_1 can have 3 or more items per list element.
in do this and do that I need to keep track of the corresponding elements and their associations, eg by using a dictionary.
there is not more than this to exploit the structure of the data

Answer 1

You could first do some preprocessing and build a dictionary keyed by the individual numbers that occur in list_1 providing each key as value the set of tuples in list_1 that have that keys.

Then finding the occurrences of a pair from list_2 is as simple as taking the intersection of the sets found at the two keys, and taking the resulting set's size.

list_1 = [(0, 1, 7, 6), (1, 2, 8, 7), (2, 3, 9, 8)]
list_2 = [(0,1), (1,7), (7,8), (3,9)]

# per number as dictionary key, list the tuples from list_1 that contain it
d = dict()
for lst in list_1:
    for v in lst:
        if not v in d: d[v] = set()
        d[v].add(lst)

# for each pair, take the intersection of the corresponding lists in d
result = [(lst, len(d[lst[0]].intersection(d[lst[1]]))) for lst in list_2]

print(result)

If you need to actually do something with the found tuples from list_1 , then you would first gather those tuples without taking their number (so d[lst[0]].intersection(d[lst[1]]) ), and do your processing on them based on what len() provides (1 or 2).

Fast algorithm needed for finding tuples from within one list inside tuples of another list. Sets?

Question

1 answers

solution1
2 ACCPTED 2017-02-07 22:51:18

Fast algorithm needed for finding tuples from within one list inside tuples of another list. Sets?

Question

1 answers

solution1 2 ACCPTED 2017-02-07 22:51:18

solution1
2 ACCPTED 2017-02-07 22:51:18