简体   繁体   中英

Efficiently filtering a python list with respect to values from a second list (of tuples)

I'm trying to find the most efficient solution to do the following:

I have two lengthy lists:

a = [3, 7, 89, 1, ....] #list of user_ids
b = [(2,t1),(3,t2),(2,t3),(89,t4), ....] # list of user_id, epoch_time pairs

The objective is to retrieve all members of list a if they exist in list b (ie in the first member of each tuple in list b ). Note that a user_id may exist in multiple tuples in b .

One can fulfill this requirement like so:

result = []
for user_id in a:
    for uid,epoch_time in b:
        if user_id == uid:
            result.append(user_id)
return result

The question is, is there any way to do this faster than O(n^2) ? Eg via perhaps reorganizing b as a dictionary, for example?

You can use a set which allows for O(1) checking that an element is part of it (or not).

result = []
set_a = set(a)
for uid, epoch_time in b:
    if uid in set_a:
        result.append(uid)

If you want unique values in the result, you can use a set for result as well:

result = set()
set_a = set(a)
for uid, epoch_time in b:
    if uid in set_a:
        result.add(uid)

which could even be turned into a list at the end:

result = list(result)

For O(1), just check if the value is in the list a to start with:

result = []
for uid,epoch_time in b:
    if uid in a:
        result.append(uid)

If you don't want duplicate values, then add a condition that not only must the uid be in a but is not already existing in result :

result = []
for uid,epoch_time in b:
    if uid in a and uid not in result:
        result.append(uid)

Try it here!

You can reorganize b as a dictionary as you mentioned, then check if user_id from a is available in the dictionary.

a = [3, 7, 89, 1]
b = [(2,'t1'),(3,'t2'),(2,'t3'),(89,'t4')]
dic = {k: v for k, v in b}
result = [x for x in a if dic.get(x)]

dic.get(x) returns None if x is not a key.

I would use sets. Since you are just ditching the epoch date.

a = [3, 7, 89, 1, ....]
b = [(2,t1),(3,t2),(2,t3),(89,t4), ....]

def fn(a, b):
    a = set(a)
    b_uid, trash = zip(*b)
    b_uid = set(b_uid)
    return a.intersection(b)

This all the speed of dictionary without dealing with the values. Also fix the return type to be whatever you want. (wrap it in a list if that is what you want back.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM