使用单词列表python删除字典中的值

Question

Let's say I have a list of words 假设我有一个单词列表

 nottastyfruits = ['grape', 'orange', 'durian', 'pear']

 fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
               '002': ['apple', 'watermelon', 'pear']}

I want to go through all the keys in the dictionary and remove the words from nottastyfruits list. 我想浏览一下字典中的所有键，然后从nottastyfruits列表中删除单词。

My current code is 我当前的代码是

finalfruits = {}
for key, value in fruitGroup.items():
    fruits = []
    for fruit in value:
        if fruit not in nottastyfruits:
            fruits.append(fruit)
    finalfruits[key] = (fruits)

This takes so long to run when you have a large data text such as large text preprocessing. 当您具有大数据文本（例如大文本预处理）时，这将花费很长时间。 Is there a more efficient and faster way to do this? 有没有更有效，更快捷的方法来做到这一点？

Thank you for you time 谢谢你的时间

Answer 1

You should make a set out of your fruitlist to speedup the lookups, then use a dictionary comprehension: 您应该在水果清单中进行set以加快查找速度，然后使用字典理解：

nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
           '002': ['apple', 'watermelon', 'pear']}

print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}

>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}

Answer 2

Making it flat by using a dictionary comprehension will remove the overhead of the for loop. 通过使用字典理解使其平坦，将消除for循环的开销。

Making nottastyfruits a set will decrease lookup time: 将nottastyfruits设置nottastyfruits一组将减少查找时间：

nottastyfruits  = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}

Answer 3

One low-hanging fruit, if you will, is to make nottastyfruits a set . 如果愿意的话，一种低落的水果是将nottastyfruits一set 。 Also, you can use comprehensions to squeeze some performance out. 另外，您可以使用理解力来压缩某些性能。

In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
   ...:                '002': ['apple', 'watermelon', 'pear']
   ...:               }

In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}

In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}

In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}

Answer 4

Since both nottastyfruits and lists in the dictionary are flat lists, you can use sets to get the difference between the two. 由于nottastyfruits和字典中的列表都是平面列表，因此可以使用集合来获取两者之间的差异。

nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }

for key, value in fruitGroup.iteritems():
    fruitGroup[key] = list(set(value).difference(nottastyfruits))

print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"

Answer 5

Below is a benchmark of differents proposed solutions plus a solution based on the filter() function: 以下是各种提议解决方案的基准，以及基于filter()函数的解决方案：

from timeit import timeit


nottastyfruits = ['grape', 'orange', 'durian', 'pear']

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
              '002': ['apple', 'watermelon', 'pear']}


def fruit_filter_original(fruit_groups, not_tasty_fruits):
    final_fruits = {}
    for key, value in fruit_groups.items():
        fruits = []
        for fruit in value:
            if fruit not in not_tasty_fruits:
                fruits.append(fruit)
        final_fruits[key] = (fruits)
    return final_fruits


def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
    not_tasty_fruits = set(not_tasty_fruits)
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set(fruit_groups, not_tasty_fruits):
    return {group: list(set(fruits).difference(not_tasty_fruits))
            for group, fruits in fruit_groups.items()}


def fruit_filter_filter(fruit_groups, not_tasty_fruits):
    return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
            for group, fruits in fruit_groups.items()}


print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))


print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))

We can see that all solutions are NOT equal in term of performance: 我们可以看到，所有解决方案的性能都不相同：

{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159  # fruit_filter_original
2.36822144247  # fruit_filter_comprehension
2.46125930873  # fruit_filter_set_comprehension
4.09036626702  # fruit_filter_set
3.76554637862  # fruit_filter_filter

The comprehension based solution is the better but it is not a very significant improvement (with the given data at least) compared to the original code. 与原始代码相比，基于理解的解决方案更好，但并不是一个非常明显的改进（至少使用给定的数据）。 The set comprehension solution is also a small improvement. 集合理解解决方案也有一点改进。 The solutions based on filter function and set difference are quite slow... 基于滤波器功能和设置差异的解决方案相当慢...

Conclusion : If you are looking for performance, the solutions from Moses Koledoye and juanpa.arrivillaga seem to be better. 结论：如果您正在寻找性能，Moses Koledoye和juanpa.arrivillaga的解决方案似乎更好。 However, those results could be different with bigger data, so it could be a good idea to do the test with real data. 但是，对于更大的数据，这些结果可能会有所不同，因此对真实数据进行测试可能是一个好主意。

使用单词列表python删除字典中的值

问题描述

5 个解决方案

解决方案1
3 2016-10-20 09:14:33

解决方案2
3 已采纳 2016-10-20 09:14:44

解决方案3
2 2016-10-20 09:17:10

解决方案4
1 2016-10-20 09:28:45

解决方案5
1 2016-10-20 10:03:58

使用单词列表python删除字典中的值

问题描述

5 个解决方案

解决方案1 3 2016-10-20 09:14:33

解决方案2 3 已采纳 2016-10-20 09:14:44

解决方案3 2 2016-10-20 09:17:10

解决方案4 1 2016-10-20 09:28:45

解决方案5 1 2016-10-20 10:03:58

解决方案1
3 2016-10-20 09:14:33

解决方案2
3 已采纳 2016-10-20 09:14:44

解决方案3
2 2016-10-20 09:17:10

解决方案4
1 2016-10-20 09:28:45

解决方案5
1 2016-10-20 10:03:58