I need to check if the items in one list are in another list. Both lists contain paths to files.
list1 = [a/b/c/file1.txt, b/c/d/file2.txt]
list2 = [a/b/c/file1.txt, b/c/d/file2.txt, d/f/g/test4.txt, d/k/test5.txt]
I tried something like:
len1 = len(list1)
len2 = len(list2)
res = list(set(list2) - set(list1))
len3 = len(res)
if len2 - len1 == len3:
print("List2 contains all the items in list1")
But it's not an optimal option, I have lists of 50k+ items. I think a good solution can be by creating a hash table, but I don't know exactly how I could build it. If you have any suggestions you can leave a message.
Python set
s are based on hashing, hence you cannot put unhashable objects inside set
s. Rather calculating lengths, directly perform set difference
:
>>> list1 = ['a/b/c/file1.txt', 'b/c/d/file2.txt']
>>> list2 = ['a/b/c/file1.txt', 'b/c/d/file2.txt', 'd/f/g/test4.txt', 'd/k/test5.txt']
>>> if (set(list1) - set(list2)): # will return empty set (Falsy) if all are contained
print("List2 contains all the items in list1")
List2 contains all the items in list1
Here is the breakdown:
>>> difference = set(list1) - set(list2)
>>> difference
set()
>>> bool(difference)
False
I think a good solution can be by creating a hash table, but I don't know exactly how I could build it.
Sets are already implemented using hash tables , so you are already doing that.
Supposing you don't have (or don't care about) duplicates, you could try:
list1 = [1,2,3]
list2 = [1,2,3,4]
set(list1).issubset(list2)
Notice how there's no need to convert list2
to a set, see the comments on this answer .
EDIT: both your solution and mine are O(n) average, it won't get faster than that. But your solution could avoid some operations like converting the difference res
into a list just to get its size.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.