简体   繁体   中英

Fast algorithm needed for finding tuples from within one list inside tuples of another list. Sets?

I have following lists:

list_1 = [(0, 1, 7, 6), (1, 2, 8, 7), (2, 3, 9, 8), ...]
list_2 = [(0,1), (1,7), (7,8), (3,9), ...]

Both lists have a length of 200000 or more elements.

I need a fast algorithm to check how often an element of list_2 occurs in an element of list_1 . In the example above, the second element of list_2 which is (1,7) occurs two times in list_1 , respectively in the first and second list element.

In my case it is a valid hit, if both numbers are a subset of list_1 independent of their order. So I thought I go with sets and use .issubset .

for item1 in list_1:
    count = 0
    for item2 in list_2:
        if set(item2).issubset(set(item1)):
            count += count
    if count == 1:
        do this
    if count == 2:
        do that

The data from the lists are structured in a way, that I know upfront, that the variable count can only have the values 1 or 2 . And I know that a loop of O(N**2) is not smart at all and that the if statements in it do not boost performance. Actually, in my current implementation the elements of list_2 are already of type set , but the snippet above is shorter and easy to read.

I believe that there are smart solutions existing for this task.

My application uses numpy and scipy, so any KD-tree search or similar (if applicable) would also be fine.

EDIT

I need to be more specific:

  • list_2 always contains pairs. list_1 can have 3 or more items per list element.
  • in do this and do that I need to keep track of the corresponding elements and their associations, eg by using a dictionary.
  • there is not more than this to exploit the structure of the data

You could first do some preprocessing and build a dictionary keyed by the individual numbers that occur in list_1 providing each key as value the set of tuples in list_1 that have that keys.

Then finding the occurrences of a pair from list_2 is as simple as taking the intersection of the sets found at the two keys, and taking the resulting set's size.

list_1 = [(0, 1, 7, 6), (1, 2, 8, 7), (2, 3, 9, 8)]
list_2 = [(0,1), (1,7), (7,8), (3,9)]

# per number as dictionary key, list the tuples from list_1 that contain it
d = dict()
for lst in list_1:
    for v in lst:
        if not v in d: d[v] = set()
        d[v].add(lst)

# for each pair, take the intersection of the corresponding lists in d
result = [(lst, len(d[lst[0]].intersection(d[lst[1]]))) for lst in list_2]

print(result)

If you need to actually do something with the found tuples from list_1 , then you would first gather those tuples without taking their number (so d[lst[0]].intersection(d[lst[1]]) ), and do your processing on them based on what len() provides (1 or 2).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM