使用單詞列表python刪除字典中的值

Question

假設我有一個單詞列表

 nottastyfruits = ['grape', 'orange', 'durian', 'pear']

 fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
               '002': ['apple', 'watermelon', 'pear']}

我想瀏覽一下字典中的所有鍵，然后從nottastyfruits列表中刪除單詞。

我當前的代碼是

finalfruits = {}
for key, value in fruitGroup.items():
    fruits = []
    for fruit in value:
        if fruit not in nottastyfruits:
            fruits.append(fruit)
    finalfruits[key] = (fruits)

當您具有大數據文本（例如大文本預處理）時，這將花費很長時間。 有沒有更有效，更快捷的方法來做到這一點？

謝謝你的時間

Answer 1

您應該在水果清單中進行set以加快查找速度，然后使用字典理解：

nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
           '002': ['apple', 'watermelon', 'pear']}

print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}

>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}

Answer 2

通過使用字典理解使其平坦，將消除for循環的開銷。

將nottastyfruits設置nottastyfruits一組將減少查找時間：

nottastyfruits  = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}

Answer 3

如果願意的話，一種低落的水果是將nottastyfruits一set 。 另外，您可以使用理解力來壓縮某些性能。

In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
   ...:                '002': ['apple', 'watermelon', 'pear']
   ...:               }

In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}

In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}

In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}

Answer 4

由於nottastyfruits和字典中的列表都是平面列表，因此可以使用集合來獲取兩者之間的差異。

nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }

for key, value in fruitGroup.iteritems():
    fruitGroup[key] = list(set(value).difference(nottastyfruits))

print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"

Answer 5

以下是各種提議解決方案的基准，以及基於filter()函數的解決方案：

from timeit import timeit


nottastyfruits = ['grape', 'orange', 'durian', 'pear']

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
              '002': ['apple', 'watermelon', 'pear']}


def fruit_filter_original(fruit_groups, not_tasty_fruits):
    final_fruits = {}
    for key, value in fruit_groups.items():
        fruits = []
        for fruit in value:
            if fruit not in not_tasty_fruits:
                fruits.append(fruit)
        final_fruits[key] = (fruits)
    return final_fruits


def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
    not_tasty_fruits = set(not_tasty_fruits)
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set(fruit_groups, not_tasty_fruits):
    return {group: list(set(fruits).difference(not_tasty_fruits))
            for group, fruits in fruit_groups.items()}


def fruit_filter_filter(fruit_groups, not_tasty_fruits):
    return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
            for group, fruits in fruit_groups.items()}


print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))


print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))

我們可以看到，所有解決方案的性能都不相同：

{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159  # fruit_filter_original
2.36822144247  # fruit_filter_comprehension
2.46125930873  # fruit_filter_set_comprehension
4.09036626702  # fruit_filter_set
3.76554637862  # fruit_filter_filter

與原始代碼相比，基於理解的解決方案更好，但並不是一個非常明顯的改進（至少使用給定的數據）。 集合理解解決方案也有一點改進。 基於濾波器功能和設置差異的解決方案相當慢...

結論：如果您正在尋找性能，Moses Koledoye和juanpa.arrivillaga的解決方案似乎更好。 但是，對於更大的數據，這些結果可能會有所不同，因此對真實數據進行測試可能是一個好主意。

使用單詞列表python刪除字典中的值

問題描述

5 個解決方案

解決方案1
3 2016-10-20 09:14:33

解決方案2
3 已采納 2016-10-20 09:14:44

解決方案3
2 2016-10-20 09:17:10

解決方案4
1 2016-10-20 09:28:45

解決方案5
1 2016-10-20 10:03:58

使用單詞列表python刪除字典中的值

問題描述

5 個解決方案

解決方案1 3 2016-10-20 09:14:33

解決方案2 3 已采納 2016-10-20 09:14:44

解決方案3 2 2016-10-20 09:17:10

解決方案4 1 2016-10-20 09:28:45

解決方案5 1 2016-10-20 10:03:58

解決方案1
3 2016-10-20 09:14:33

解決方案2
3 已采納 2016-10-20 09:14:44

解決方案3
2 2016-10-20 09:17:10

解決方案4
1 2016-10-20 09:28:45

解決方案5
1 2016-10-20 10:03:58