简体   繁体   English

比较列表和列表字典的最快方法

[英]Fastest way to compare a list to a dict of lists

So I have a 2 lists: 所以我有两个清单:

list1 = ['abc', 'efg', 'hijk'] #list of strings

list2 = ['lmno', 'pqrs'] #also a list of strings

then I have a dict which is fairly large usually, there are only ~100 keys and a few hundred thousand values of strings populating the lists 然后我有一个通常很大的字典,只有约100个键和数十万个字符串值填充列表

d = {'abc': ['lmno'], 'efg': ['lmno', 'pqrs']}

so I need to loop through each item of list1 and each of list2: 所以我需要遍历list1的每个项目和list2的每个项目:

example: 例:

for i1 in list1:
   for i2 in list2:
      print(i1, i2)

then compare the data to the dict: 然后将数据与字典进行比较:

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

currently, my code is like above but it is very slow when the dict is large is there a faster way to do this? 目前,我的代码与上面类似,但是当dict很大时,它会非常慢,是否有更快的方法呢?

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

Swap the second and the third lines so you don't iterate over list2 if i1.lower() is not in d . 交换第二行和第三行,如果i1.lower()不在d则不会遍历list2

for i1 in list1:
    if i1.lower() in d:
        for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                 #process data

Also, as @aran-fey mentioned, convert your d to a dict of sets first: 另外,如@ aran-fey所述,首先将d转换为set的字典:

d = {k: set(v) for k, v in d.items()}

Even further (thanks to @AlexHall): 更进一步(感谢@AlexHall):

d = {k: set(v) for k, v in d.items()}
set2 = {i2.lower() for i2 in list2}

for i1 in list1:
    for i2 in set2 - d.get(i1.lower(), set()):
         #process data

I guess You have two lists one contain the key and other the values. 我猜您有两个列表,一个包含键,另一个包含值。 You need to check the key in the dict before iterating over the values, which will make this more efficient. 您需要在遍历值之前检查字典中的键,这将使此操作更有效。

for i1 in list1:
  if i1.lower() in d:
    for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                #process data

Maybe not the fastest, you would have to check. 也许不是最快的,您必须检查一下。 But it is neater. 但是,它更整洁。

from operator import itemgetter

keys_to_check = [
    'abc', 'efg', 'hijk'
]

strings_to_check = [
    'lmno', 'pqrs'
]

d = {
    'abc': ['lmno'],
    'efg': ['lmno', 'pqrs']
}

# Makes function that will get values for specified keys
# . Checks if the key is within dictionary
values = itemgetter(*(key.lower() for key in keys_to_check if key.lower() in d))

for value in values(d):
    # Checks if any fo strings within value is in the strings_to_check
    # . if so, ignore that value
    if any(strng in strings_to_check for strng in value):
        continue
    else:
        # process data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM