简体   繁体   English

如何快速从大量数字中生成所有对的列表?

[英]How to quickly generating a list of all pairs from a large set of numbers?

I create a list with numbers from 0 to 131072: 我创建了一个数字从0到131072的列表:

x = [i for i in range(131072)] 

Then all pairs, except for pairs of the same numbers: 然后所有对,除了相同编号的对:

pairs = []
append_pairs = pairs.append
for i in range(len(x)):
    for j in range(len(x)):
        if x[i]!=x[j]:
           x2 = [x[i], x[j]] 
           append_pairs(x2)

which gives: 这使:

pairs = [[0, 1], [0, 2], [0, 3], ... [131071, 131070]]

But in this syntax it takes a very very long time. 但是使用这种语法需要花费非常长的时间。 Can it be done faster? 可以更快地完成吗?

You can use itertools.combinations but that will probably also take a little while like so: 您可以使用itertools.combinations但这可能会花费一些时间,如下所示:

import itertools as it

n = 131072
pairs = it.combinations(range(n), 2)

Note that the code above will not give you the list of all pairs but a generator over pairs: 请注意,上面的代码不会为您提供所有对的列表,而是一个成对的生成器:

>>> pairs
<itertools.combinations at 0x7fb939a72a48>

You can get the list using 您可以使用以下方式获取列表

pairs = list(it.combinations(range(n), 2)

Using numpy is probably faster: 使用numpy可能更快:

import numpy as np

pairs = np.transpose(np.triu_indices(n, 1))

However, the number of pairs you want to generate is enormous and you cannot store the numbers in memory (unless you have a very powerful machine). 但是,您要生成的对数非常多,并且您无法将数对存储在内存中(除非您有一台非常强大的计算机)。 In particular, you get n * (n - 1) / 2 pairs. 特别是,您将获得n * (n - 1) / 2对。 If you store the numbers as 8-byte integers, you're looking at just under 70 GB of memory. 如果将数字存储为8字节整数,则表示内存不足70 GB。

For n = 5000 : 对于n = 5000

  • Itertools: 818 ms ± 15.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Itertools:每个循环818毫秒±15.8毫秒(平均±标准偏差,共运行7次,每个循环1次)
  • Numpy: 254 ms ± 30.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy:每个循环254毫秒±30.8毫秒(平均±标准偏差,共运行7次,每个循环1次)
  • Original method: 3.72 s ± 72.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 原始方法:每个循环3.72 s±72.6 ms(平均±标准偏差,共运行7次,每个循环1次)

Note: Because there is more in-built code available, I have generated distinct pairs. 注意:因为有更多内置代码可用,所以我生成了不同的对。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM