简体   繁体   English

在Python中向后迭代字典

[英]Iterating through Dictionary backwards in Python

I have a long dictionary of elements and I want to remove any dictionary entries that only has a list with only 1 element. 我有一个很长的元素字典,我想删除任何只有一个元素列表的字典条目。 For example 例如

    wordDict={'aardvark':['animal','shell'], 'bat':['animal', 'wings'], 
              'computer':['technology'], 'donut':['food','sweet']}

I want to remove the 'computer' entry because the list in it only has one element. 我想删除'computer'条目,因为它中的列表只有一个元素。 I started by iterating through the wordDict and putting each entry in the dictionary in a separate list so that it looks like this 我开始迭代wordDict并将字典中的每个条目放在一个单独的列表中,以便它看起来像这样

    wordList=[['animal','shell'],['animal','wings'],['technology'],['food','sweet']]

and then iterating through that list backwards, checking if the length of each element in the list is greater than 1. Backwards because going forwards causes the index to change as I delete. 然后向后遍历该列表,检查列表中每个元素的长度是否大于1.向后,因为向前导致索引在我删除时改变。

So in wordList, ['technology'] gets removed and this is what is left 所以在wordList中,['technology']被删除了,这就是剩下的东西

    wordList=[['animal','shell'],['animal','wings'],['food','sweet']]

The problem is that as wordDict becomes substantially large (100k+ words), it takes a long time to put the wordDict into a list then iterate through that list and I want to make it more efficient. 问题是当wordDict变得非常大(100k +单词)时,将wordDict放入列表然后遍历该列表需要很长时间,我想让它更有效。

I was thinking about iterating through the dictionary backwards, checking if each entry has more than one word and then removing the dictionary entry if it doesn't. 我正在考虑向后遍历字典,检查每个条目是否有多个单词,然后删除字典条目(如果没有)。 At the end, what needs to be returned is a list, not a dictionary so the index doesn't matter in the end, I only used them for sorting purposes. 最后,需要返回的是列表,而不是字典,因此索引最终无关紧要,我只将它们用于排序目的。

Is there a way to do this? 有没有办法做到这一点?

You can drop the elements you don't want and create a new dictionary, with dictionary comprehension, like this 你可以删除你不想要的元素,并创建一个新的字典,使用字典理解,就像这样

>>> {word: items for word, items in wordDict.items() if len(items) > 1}
{'aardvark': ['animal', 'shell'],
 'bat': ['animal', 'wings'],
 'donut': ['food', 'sweet']}

You are iterating through the wordDict dictionary and checking if the length of the items is greater than 1. If it is, then include it in the new dictionary being constructed, otherwise don't include it. 您正在遍历wordDict字典并检查items的长度是否大于1.如果是,则将其包含在正在构造的新字典中,否则不包括它。

The first option is as thefourtheye suggested try rebuild the dictionary accordingly and letting the only elements which has a list with more than one element:- 第一个选项是当thefourtheye建议尝试相应地重建字典并让只有具有多个元素的列表的元素: -

new_dict = {key:value for key,value in d.iteritems() if len(value) > 1} 

And the second option is iterate the dictionary and remove the items accordingly but it will not that much efficient as the first option. 第二个选项是迭代字典并相应地删除项目,但它不会像第一个选项那样高效。

If you only needs the list in the end you can do: 如果您最终只需要列表,您可以:

wordList = list(filter(lambda x: len(x) > 1, wordDict.values()))

It's not necessary to create a temporary dictionary... 没有必要创建临时字典......

Edit: An alternative (actually clearer and faster than the above) is 编辑:一种替代方案(实际上比上面更清晰,更快)

wordList = list(value for value in wordDict.values() if len(value) > 1)

Bonus: if you wan't to filter empty values, you can just do: 额外奖励:如果您不想过滤值,您可以这样做:

wordList = list(filter(bool, wordDict.values()))

Edit: the alternative here too (it's a little bit strange, but is right): 编辑:这里的替代方案(它有点奇怪,但是是对的):

wordList = list(value for value in wordDict.values() if value)

The logical value of empty lists (and dicts, etc) is False . 空列表(和dicts等)的逻辑值为False

I think it's faster to keep track of elements with length = 1. When inserting an element to dictionary with length = 1 or performing an operation on an element that makes the length = 1, put key of that element in a list like "singles". 我认为跟踪长度为1的元素会更快。当将元素插入到长度为1的字典中或对长度为1的元素执行操作时,将该元素的键放在像“单身”这样的列表中。 Then, when you want, remove all elements with length = 1 using keys in "singles". 然后,如果需要,使用“单个”中的键删除长度= 1的所有元素。 It eliminates the need to traverse all element of dictionary. 它消除了遍历字典的所有元素的需要。

For example, when inserting to dictionary: 例如,插入字典时:

def insert(wordDict, key, element, singles):
    wordDict[key] = element
    if len(element) == 1:
        singles.append(key)

And, when doing an operation of an element that possibly changes its length: 并且,在对可能改变其长度的元素进行操作时:

def some_operation(key, element, singles):
    # Do something.
    if len(element) == 1:
        singles.append(key)

At the end, when you want to perform your deletion of all elements with length = 1: 最后,当您要删除length = 1的所有元素时:

def delete_singles(wordDict, singles):
    for k in singles:
        wordDict.pop(k)

Now, just do all inserts and modifications with these functions and use delete_single() to do the deletion. 现在,只需使用这些函数进行所有插入和修改,并使用delete_single()进行删除。 I hope it works fast! 我希望它能快速运作!

One may modify the length of list as one iterates because it is possible to do it sensibly, by iterating backwards. 人们可以在一次迭代时修改列表的长度,因为通过向后迭代可以明智地做到这一点。 But as you noticed, it is also slow (O(nk), where k in the number of items delected. 但是正如你所注意到的那样,它也很慢(O(nk),其中k项被选中。

One may not change the keys of a dict while iterating because that could cause a rebuilding of the internal hash array that is the basis of the iteration. 迭代时可能不会更改dict的键,因为这可能会导致重建作为迭代基础的内部哈希数组。 One must instead make a separate collection of keys to iterate through. 必须改为使用单独的密钥集合进行迭代。

wordDict={'aardvark':['animal','shell'], 'bat':['animal', 'wings'], 
              'computer':['technology'], 'donut':['food','sweet']}

for key in list(wordDict.keys()):
    if len(wordDict[key]) <= 1:
        del wordDict[key]

print(wordDict)

prints 版画

{'aardvark': ['animal', 'shell'], 'bat': ['animal', 'wings'], 'donut': ['food', 'sweet']}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM