I'm trying to find the most efficient solution to do the following:
I have two lengthy lists:
a = [3, 7, 89, 1, ....] #list of user_ids
b = [(2,t1),(3,t2),(2,t3),(89,t4), ....] # list of user_id, epoch_time pairs
The objective is to retrieve all members of list a
if they exist in list b
(ie in the first member of each tuple in list b
). Note that a user_id
may exist in multiple tuples in b
.
One can fulfill this requirement like so:
result = []
for user_id in a:
for uid,epoch_time in b:
if user_id == uid:
result.append(user_id)
return result
The question is, is there any way to do this faster than O(n^2) ? Eg via perhaps reorganizing b
as a dictionary, for example?
You can use a set which allows for O(1) checking that an element is part of it (or not).
result = []
set_a = set(a)
for uid, epoch_time in b:
if uid in set_a:
result.append(uid)
If you want unique values in the result, you can use a set for result
as well:
result = set()
set_a = set(a)
for uid, epoch_time in b:
if uid in set_a:
result.add(uid)
which could even be turned into a list at the end:
result = list(result)
For O(1), just check if the value is in the list a
to start with:
result = []
for uid,epoch_time in b:
if uid in a:
result.append(uid)
If you don't want duplicate values, then add a condition that not only must the uid
be in a
but is not already existing in result
:
result = []
for uid,epoch_time in b:
if uid in a and uid not in result:
result.append(uid)
You can reorganize b
as a dictionary as you mentioned, then check if user_id
from a
is available in the dictionary.
a = [3, 7, 89, 1]
b = [(2,'t1'),(3,'t2'),(2,'t3'),(89,'t4')]
dic = {k: v for k, v in b}
result = [x for x in a if dic.get(x)]
dic.get(x)
returns None
if x
is not a key.
I would use sets. Since you are just ditching the epoch date.
a = [3, 7, 89, 1, ....]
b = [(2,t1),(3,t2),(2,t3),(89,t4), ....]
def fn(a, b):
a = set(a)
b_uid, trash = zip(*b)
b_uid = set(b_uid)
return a.intersection(b)
This all the speed of dictionary without dealing with the values. Also fix the return type to be whatever you want. (wrap it in a list if that is what you want back.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.