简体   繁体   English

python字典中返回值的随机顺序

[英]Random order of returned values in python dictionary

I don't understand this and it's going to bother me until I do.我不明白这一点,它会困扰我直到我明白。

This python code counts the number of times each character appears in the 'message' variable:此 python 代码计算每个字符出现在 'message' 变量中的次数:

message = 'Some random string of words'

dictionary= {}

for character in message.upper():
    dictionary.setdefault(character,0)
    dictionary[character] = dictionary[character] + 1

print(dictionary)

If you run this multiple times, you will notice the counts are returned in seemingly random order each time.如果您多次运行它,您会注意到计数每次都以看似随机的顺序返回。 Why is this?为什么是这样? I would think that the loop should start at the beginning of the character string each time and return the values in a consistent order...but they don't.我认为循环应该每次都从字符串的开头开始,并以一致的顺序返回值……但事实并非如此。 Is there some element of randomness in the setdefault() , print() , or upper() methods that impacts the order of processing of the string?setdefault()print()upper()方法中是否存在一些影响字符串处理顺序的随机元素?

Because of two things: 由于两件事:

  • Dictionaries "aren't ordered". 词典“没有订购”。 You of course get some order, but it depends, among other things, on the hash values of the keys. 你当然得到一些订单,但它取决于键的哈希值等。
  • You use (single-character) strings as keys, and string hashes are randomized . 您使用(单字符)字符串作为键,并且字符串哈希值是随机的 If you do print(hash(message)) or even just print(hash('c')) then you'll see that that differs from one run to the next as well. 如果你print(hash(message))或者甚至只是print(hash('c'))那么你会看到不同的运行也不同。

So since the order depends on the hashes and the hashes change from one run to the next, of course you can get different orders. 因此,由于顺序取决于哈希值,并且哈希值从一次运行变为下一次运行,当然您可以获得不同的顺序。

On the other hand, if you repeat it in the same run , you'll likely get the same order: 另一方面,如果你在同一次运行中重复它,你可能会获得相同的顺序:

message = 'Some random string of words'
for _ in range(10):
    dictionary= {}
    for character in message:
        dictionary.setdefault(character,0)
        dictionary[character] = dictionary[character] + 1
    print(dictionary)

I just ran that and it printed the exact same order all ten times, as expected. 我只是跑了它,它按照预期打印了完全相同的订单十次。 Then I ran it again, and it printed a different order, but again all ten times the same. 然后我再次运行它,它打印了一个不同的顺序,但再次十次相同。 As expected. 正如所料。

dict s are inherently unordered. dict本质上是无序的。

From the Python docs : Python文档

Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary's history of insertions and deletions. 键和值以任意顺序迭代,这是非随机的,在Python实现中各不相同,并且取决于字典的插入和删除历史。

EDIT 编辑

An alternative to your code that correctly accomplishes your goal is to use an OrderedCounter : 正确实现目标的代码的替代方法是使用OrderedCounter

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)

message = 'Some random string of words'
print(OrderedCounter(message.upper()))

This happens due to security. 这是因为安全性而发生的。 When you're writing any application where external user can provide data which ends up in a dictionary, you need to make sure they don't know what the result of hashing will be. 当您编写任何外部用户可以提供最终在字典中的数据的应用程序时,您需要确保他们不知道散列的结果是什么。 If they do, they can make sure that every new entry they provide will hash to the same bin. 如果他们这样做,他们可以确保他们提供的每个新条目将散列到同一个bin。 When they do that, you end up with your "amortized O(1) " retrievals taking O(n) instead, because every get() from a dictionary will get the same bin and will have to traverse all items in it. 当他们这样做时,你最终得到的是“分摊的O(1) ”检索,而不是O(n) ,因为字典中的每个get()都会得到相同的bin,并且必须遍历其中的所有项目。 (or possibly longer considering other processing of the request) (或者可能更长时间考虑其他处理请求)

Have a look at https://131002.net/siphash/siphashdos_appsec12_slides.pdf for some more info. 有关更多信息, 查看https://131002.net/siphash/siphashdos_appsec12_slides.pdf

Almost all languages prevent this by generating a random number at startup and using that as the hash seed, rather than starting from some predefined number like 0 . 几乎所有语言都通过在启动时生成随机数并将其用作散列种子来防止这种情况,而不是从某个预定义的数字(如0

The way that dict is implemented is designed for look ups to be quick and efficient. 实现dict的方式是为了使查找快速有效。 Even as the size of the dict increases. 即使dict的大小增加。 Under the hood this means that the key order may change. 在引擎盖下,这意味着密钥顺序可能会改变。

If the order of the keys is important to you, try using an ordereddict from collections . 如果键的顺序对您很重要,请尝试使用collectionsordereddict

Since Python 3.7 dictionaries are now insertion ordered ( documentation )由于 Python 3.7 词典现在是按插入顺序排列的文档

Dictionaries preserve insertion order.字典保留插入顺序。 Note that updating a key does not affect the order.请注意,更新密钥不会影响顺序。 Keys added after deletion are inserted at the end.删除后添加的键插入最后。

So the expected behavior you were expecting in the question now is the actual behavior.因此,您现在在问题中期望的预期行为是实际行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM