简体   繁体   English

random.choice不是随机的

[英]random.choice not random

I'm using Python 2.5 on Linux, in multiple parallel FCGI processes. 我在Linux上使用Python 2.5,在多个并行的FCGI进程中。 I use 我用

    chars = string.ascii_letters + string.digits
    cookie = ''.join([random.choice(chars) for x in range(32)])

to generate distinct cookies. 生成不同的cookie。 Assuming that the RNG is seeded from /dev/urandom, and that the sequence of random numbers comes from the Mersenne twister, I would expect that there is practically zero chance of collision. 假设RNG是从/ dev / urandom播种的,并且随机数序列来自Mersenne twister,我预计几乎没有碰撞机会。

However, I do see regular collisions, even though only a few (<100) users are logged in at any time. 但是,我确实看到了常规冲突,即使只有少数(<100)用户随时登录。

Why are the random numbers not more random? 为什么随机数不随机?

It shouldn't be generating duplicates. 它不应该生成重复。

import random
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def gen():
    return ''.join([random.choice(chars) for x in range(32)])

test = [gen() for i in range(100000)]
print len(test), len(set(test)) # 100000 100000

The chances of duplicates is significant with chars = "ab"; chars =“ab”;重复的可能性很大; 126 duplicates in 1000000 iterations. 在1000000次迭代中重复126次。 It's nonexistant with 62. 它与62不同。

That said, this isn't a good way to generate cookies, because session cookies need to be unpredictable, to avoid attacks involving stealing other people's session cookies. 也就是说,这不是生成cookie的好方法,因为会话cookie需要是不可预测的,以避免涉及窃取其他人的会话cookie的攻击。 The Mersenne Twister is not designed for generating secure random numbers. Mersenne Twister不是为生成安全随机数而设计的。 This is what I do: 这就是我做的:

import os, hashlib
def gen():
    return hashlib.sha1(os.urandom(512)).hexdigest()

test = [gen() for i in range(100000)]
print len(test), len(set(test))

... which should be very secure (which is to say, difficult to take a string of session cookies and guess other existing session cookies from them). ...这应该是非常安全的(也就是说,难以获取一串会话cookie并猜测其他现有的会话cookie)。

This is definitely not a normal collision scenario: 这绝对不是正常的碰撞场景:

  • 32 characters with 62 options per character is equivalent to 190 bits (log2(62) * 32) 每个字符有62个选项的32个字符相当于190位(log2(62)* 32)
  • According to the birthday paradox, you should be receiving a collision naturally once every 2**95 cookies, which means never 根据生日悖论,你应该每2 ** 95饼干自然发生一次碰撞,这意味着永远不会

Could this be a concurrency issue? 这可能是一个并发问题吗?

  • If so, use different random.Random instances for each thread 如果是这样,请为每个线程使用不同的random.Random实例
  • Can save these instances in thread-local storage ( threading.local() ) 可以将这些实例保存在线程本地存储中( threading.local()
  • On linux, Python should seed them using os.urandom() - not system time - so you should get different streams for each thread. 在Linux上,Python应该使用os.urandom()来播种它们 - 而不是系统时间 - 所以你应该为每个线程获得不同的流。
  1. I don't know how your FCGI processes are being spawned, but is it possible that it's using fork() after the Python interpreter has started (and the random module has been imported by something), hence effectively seeding two processes' random._inst s from the same source? 我不知道你的FCGI进程是如何产生的,但它可能是在Python解释器启动后使用fork()(并且随机模块已被某些东西导入),因此有效地为两个进程random._inst来自同一个来源?

  2. Maybe put some debugging in to check that it is correctly seeding from urandom, and not falling back to the less rigorous time-based seed? 也许进行一些调试来检查它是否正确地从urandom播种,而不是回到不那么严格的基于时间的种子?

eta re comment: man! 评论:男人! That's me stumped then; 那是我难倒的; if the RNG always has different state at startup I can't see how you could possibly get collisions. 如果RNG在启动时总是有不同的状态,我看不出你怎么可能发生碰撞。 Weird. 奇怪的。 Would have to put in a lot of state logging to investigate the particular cases which result in collisions, I guess, which sounds like a lot of work trawling through logs. 我想,必须进行大量的状态记录来调查导致冲突的特定情况,这听起来像是通过日志进行的大量工作。 Could it be (1a) the FCGI server usually doesn't fork, but occasionally does (maybe under load, or something)? 可能是(1a)FCGI服务器通常不会分叉,但偶尔会(可能在负载或其他东西)?

Or (3) some higher-level problem such as a broken HTTP proxy passing the same Set-Cookie to multiple clients? 或者(3)某些更高级别的问题,例如破坏的HTTP代理将相同的Set-Cookie传递给多个客户端?

I had to erase my original answer, which suggested that generator is not seeded from /dev/urandom , since its source (for Python 3.x) clearly says that it is: 我不得不删除我的原始答案,这表明生成器不是从/dev/urandom播种的,因为它的源代码 (对于Python 3.x)清楚地表明它是:

def seed(self, a=None):
    """Initialize internal state from hashable object.

    None or no argument seeds from current time or from an operating
    system specific randomness source if available.

    If a is not None or an int or long, hash(a) is used instead.
    """

    if a is None:
        try:
            a = int(_hexlify(_urandom(16)), 16)
        except NotImplementedError:
            import time
            a = int(time.time() * 256) # use fractional seconds

    super().seed(a)
    self.gauss_next = None

I therefore humbly accept that there are mysteries in the world that I may not be able to decipher. 因此,我谦卑地接受世界上有一些我无法破译的谜团。

To avoid the problem, you can use a sequence of cookies, that are guaranteed to be different (you can eg use a set). 为了避免这个问题,您可以使用一系列保证不同的cookie(例如,您可以使用一组)。 Each time you give a cookie to someone, you take it from the sequence and you add another to it. 每次向某人提供cookie时,都会从序列中获取cookie,然后再添加另一个cookie。 Another option is to generate a UUID and use that as a cookie. 另一种选择是生成UUID并将其用作cookie。

Another way to avoid the problem could be to hold a private key, and use a (eg MD5) checksum of the private key, with a counter value joined to it. 避免该问题的另一种方法是保持私钥,并使用私钥的(例如MD5)校验和,并将计数器值连接到它。 The probability for collisions will then be very low. 碰撞的概率将非常低。 To be safer, add a few more variables to the checksum, like the current time, the ip address of the user, ... 为了更安全,在校验和中添加一些变量,比如当前时间,用户的IP地址,......

Libraries to generate cookies exist. 存在生成cookie的库。 Any WSGI implementation probably contains a cookie generator. 任何WSGI实现都可能包含cookie生成器。

If you're only interested in how random your strings are, you could generate a file with, say, one million cookies and perform randomness checks on that file. 如果您只对字符串的随机性感兴趣,可以生成一个文件,例如一百万个cookie,并对该文件执行随机性检查。 This, however, is not what I would recommend. 然而,这不是我推荐的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM