比较列表和列表字典的最快方法

Question

So I have a 2 lists: 所以我有两个清单：

list1 = ['abc', 'efg', 'hijk'] #list of strings

list2 = ['lmno', 'pqrs'] #also a list of strings

then I have a dict which is fairly large usually, there are only ~100 keys and a few hundred thousand values of strings populating the lists 然后我有一个通常很大的字典，只有约100个键和数十万个字符串值填充列表

d = {'abc': ['lmno'], 'efg': ['lmno', 'pqrs']}

so I need to loop through each item of list1 and each of list2: 所以我需要遍历list1的每个项目和list2的每个项目：

example: 例：

for i1 in list1:
   for i2 in list2:
      print(i1, i2)

then compare the data to the dict: 然后将数据与字典进行比较：

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

currently, my code is like above but it is very slow when the dict is large is there a faster way to do this? 目前，我的代码与上面类似，但是当dict很大时，它会非常慢，是否有更快的方法呢？

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

Answer 1

Swap the second and the third lines so you don't iterate over list2 if i1.lower() is not in d . 交换第二行和第三行，如果i1.lower()不在d则不会遍历list2 。

for i1 in list1:
    if i1.lower() in d:
        for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                 #process data

Also, as @aran-fey mentioned, convert your d to a dict of sets first: 另外，如@ aran-fey所述，首先将d转换为set的字典：

d = {k: set(v) for k, v in d.items()}

Even further (thanks to @AlexHall): 更进一步（感谢@AlexHall）：

d = {k: set(v) for k, v in d.items()}
set2 = {i2.lower() for i2 in list2}

for i1 in list1:
    for i2 in set2 - d.get(i1.lower(), set()):
         #process data

Answer 2

I guess You have two lists one contain the key and other the values. 我猜您有两个列表，一个包含键，另一个包含值。 You need to check the key in the dict before iterating over the values, which will make this more efficient. 您需要在遍历值之前检查字典中的键，这将使此操作更有效。

for i1 in list1:
  if i1.lower() in d:
    for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                #process data

Answer 3

Maybe not the fastest, you would have to check. 也许不是最快的，您必须检查一下。 But it is neater. 但是，它更整洁。

from operator import itemgetter

keys_to_check = [
    'abc', 'efg', 'hijk'
]

strings_to_check = [
    'lmno', 'pqrs'
]

d = {
    'abc': ['lmno'],
    'efg': ['lmno', 'pqrs']
}

# Makes function that will get values for specified keys
# . Checks if the key is within dictionary
values = itemgetter(*(key.lower() for key in keys_to_check if key.lower() in d))

for value in values(d):
    # Checks if any fo strings within value is in the strings_to_check
    # . if so, ignore that value
    if any(strng in strings_to_check for strng in value):
        continue
    else:
        # process data

比较列表和列表字典的最快方法

问题描述

3 个解决方案

解决方案1
3 2019-09-03 08:03:18

解决方案2
0 2019-09-03 08:15:27

解决方案3
0 2019-09-03 08:23:19

比较列表和列表字典的最快方法

问题描述

3 个解决方案

解决方案1 3 2019-09-03 08:03:18

解决方案2 0 2019-09-03 08:15:27

解决方案3 0 2019-09-03 08:23:19

解决方案1
3 2019-09-03 08:03:18

解决方案2
0 2019-09-03 08:15:27

解决方案3
0 2019-09-03 08:23:19