Efficiently filtering a python list with respect to values from a second list (of tuples)

Question

I'm trying to find the most efficient solution to do the following:

I have two lengthy lists:

a = [3, 7, 89, 1, ....] #list of user_ids
b = [(2,t1),(3,t2),(2,t3),(89,t4), ....] # list of user_id, epoch_time pairs

The objective is to retrieve all members of list a if they exist in list b (ie in the first member of each tuple in list b ). Note that a user_id may exist in multiple tuples in b .

One can fulfill this requirement like so:

result = []
for user_id in a:
    for uid,epoch_time in b:
        if user_id == uid:
            result.append(user_id)
return result

The question is, is there any way to do this faster than O(n^2) ? Eg via perhaps reorganizing b as a dictionary, for example?

Answer 1

You can use a set which allows for O(1) checking that an element is part of it (or not).

result = []
set_a = set(a)
for uid, epoch_time in b:
    if uid in set_a:
        result.append(uid)

If you want unique values in the result, you can use a set for result as well:

result = set()
set_a = set(a)
for uid, epoch_time in b:
    if uid in set_a:
        result.add(uid)

which could even be turned into a list at the end:

result = list(result)

Answer 2

For O(1), just check if the value is in the list a to start with:

result = []
for uid,epoch_time in b:
    if uid in a:
        result.append(uid)

If you don't want duplicate values, then add a condition that not only must the uid be in a but is not already existing in result :

result = []
for uid,epoch_time in b:
    if uid in a and uid not in result:
        result.append(uid)

Try it here!

Answer 3

You can reorganize b as a dictionary as you mentioned, then check if user_id from a is available in the dictionary.

a = [3, 7, 89, 1]
b = [(2,'t1'),(3,'t2'),(2,'t3'),(89,'t4')]
dic = {k: v for k, v in b}
result = [x for x in a if dic.get(x)]

dic.get(x) returns None if x is not a key.

Answer 4

I would use sets. Since you are just ditching the epoch date.

a = [3, 7, 89, 1, ....]
b = [(2,t1),(3,t2),(2,t3),(89,t4), ....]

def fn(a, b):
    a = set(a)
    b_uid, trash = zip(*b)
    b_uid = set(b_uid)
    return a.intersection(b)

This all the speed of dictionary without dealing with the values. Also fix the return type to be whatever you want. (wrap it in a list if that is what you want back.

Efficiently filtering a python list with respect to values from a second list (of tuples)

Question

4 answers

solution1
1 2017-07-30 00:21:29

solution2
1 2017-07-30 00:24:49

solution3
1 ACCPTED 2017-07-30 01:23:02

solution4
0 2017-07-30 00:46:10

Efficiently filtering a python list with respect to values from a second list (of tuples)

Question

4 answers

solution1 1 2017-07-30 00:21:29

solution2 1 2017-07-30 00:24:49

solution3 1 ACCPTED 2017-07-30 01:23:02

solution4 0 2017-07-30 00:46:10

solution1
1 2017-07-30 00:21:29

solution2
1 2017-07-30 00:24:49

solution3
1 ACCPTED 2017-07-30 01:23:02

solution4
0 2017-07-30 00:46:10