简体   繁体   中英

Fastest way to compare a list to a dict of lists

So I have a 2 lists:

list1 = ['abc', 'efg', 'hijk'] #list of strings

list2 = ['lmno', 'pqrs'] #also a list of strings

then I have a dict which is fairly large usually, there are only ~100 keys and a few hundred thousand values of strings populating the lists

d = {'abc': ['lmno'], 'efg': ['lmno', 'pqrs']}

so I need to loop through each item of list1 and each of list2:

example:

for i1 in list1:
   for i2 in list2:
      print(i1, i2)

then compare the data to the dict:

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

currently, my code is like above but it is very slow when the dict is large is there a faster way to do this?

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

Swap the second and the third lines so you don't iterate over list2 if i1.lower() is not in d .

for i1 in list1:
    if i1.lower() in d:
        for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                 #process data

Also, as @aran-fey mentioned, convert your d to a dict of sets first:

d = {k: set(v) for k, v in d.items()}

Even further (thanks to @AlexHall):

d = {k: set(v) for k, v in d.items()}
set2 = {i2.lower() for i2 in list2}

for i1 in list1:
    for i2 in set2 - d.get(i1.lower(), set()):
         #process data

I guess You have two lists one contain the key and other the values. You need to check the key in the dict before iterating over the values, which will make this more efficient.

for i1 in list1:
  if i1.lower() in d:
    for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                #process data

Maybe not the fastest, you would have to check. But it is neater.

from operator import itemgetter

keys_to_check = [
    'abc', 'efg', 'hijk'
]

strings_to_check = [
    'lmno', 'pqrs'
]

d = {
    'abc': ['lmno'],
    'efg': ['lmno', 'pqrs']
}

# Makes function that will get values for specified keys
# . Checks if the key is within dictionary
values = itemgetter(*(key.lower() for key in keys_to_check if key.lower() in d))

for value in values(d):
    # Checks if any fo strings within value is in the strings_to_check
    # . if so, ignore that value
    if any(strng in strings_to_check for strng in value):
        continue
    else:
        # process data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM