[英]Python - Scoring a list of words using a dictionary of Scrabble letter values
[英]Removing values in Dictionary with a list of words python
假設我有一個單詞列表
nottastyfruits = ['grape', 'orange', 'durian', 'pear']
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
我想瀏覽一下字典中的所有鍵,然后從nottastyfruits列表中刪除單詞。
我當前的代碼是
finalfruits = {}
for key, value in fruitGroup.items():
fruits = []
for fruit in value:
if fruit not in nottastyfruits:
fruits.append(fruit)
finalfruits[key] = (fruits)
當您具有大數據文本(例如大文本預處理)時,這將花費很長時間。 有沒有更有效,更快捷的方法來做到這一點?
謝謝你的時間
您應該在水果清單中進行set
以加快查找速度,然后使用字典理解:
nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}
>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}
通過使用字典理解使其平坦 ,將消除for
循環的開銷。
將nottastyfruits
設置nottastyfruits
一組將減少查找時間:
nottastyfruits = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}
如果願意的話,一種低落的水果是將nottastyfruits
一set
。 另外,您可以使用理解力來壓縮某些性能。
In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
...: '002': ['apple', 'watermelon', 'pear']
...: }
In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}
In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}
In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
由於nottastyfruits
和字典中的列表都是平面列表,因此可以使用集合來獲取兩者之間的差異。
nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }
for key, value in fruitGroup.iteritems():
fruitGroup[key] = list(set(value).difference(nottastyfruits))
print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"
以下是各種提議解決方案的基准,以及基於filter()
函數的解決方案:
from timeit import timeit
nottastyfruits = ['grape', 'orange', 'durian', 'pear']
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
def fruit_filter_original(fruit_groups, not_tasty_fruits):
final_fruits = {}
for key, value in fruit_groups.items():
fruits = []
for fruit in value:
if fruit not in not_tasty_fruits:
fruits.append(fruit)
final_fruits[key] = (fruits)
return final_fruits
def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
return {group: [fruit for fruit in fruits
if fruit not in not_tasty_fruits]
for group, fruits in fruit_groups.items()}
def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
not_tasty_fruits = set(not_tasty_fruits)
return {group: [fruit for fruit in fruits
if fruit not in not_tasty_fruits]
for group, fruits in fruit_groups.items()}
def fruit_filter_set(fruit_groups, not_tasty_fruits):
return {group: list(set(fruits).difference(not_tasty_fruits))
for group, fruits in fruit_groups.items()}
def fruit_filter_filter(fruit_groups, not_tasty_fruits):
return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
for group, fruits in fruit_groups.items()}
print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))
print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))
我們可以看到,所有解決方案的性能都不相同:
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159 # fruit_filter_original
2.36822144247 # fruit_filter_comprehension
2.46125930873 # fruit_filter_set_comprehension
4.09036626702 # fruit_filter_set
3.76554637862 # fruit_filter_filter
與原始代碼相比,基於理解的解決方案更好,但並不是一個非常明顯的改進(至少使用給定的數據)。 集合理解解決方案也有一點改進。 基於濾波器功能和設置差異的解決方案相當慢...
結論 :如果您正在尋找性能,Moses Koledoye和juanpa.arrivillaga的解決方案似乎更好。 但是,對於更大的數據,這些結果可能會有所不同,因此對真實數據進行測試可能是一個好主意。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.