使用Numpy生成两个数组的随机组合而不重复

Question

给定两个数组，例如[0,0,0]和[1,1,1] ，已经很清楚（见这里）如何生成所有组合，即[[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]] 。 据我所知， itertools （ combinations或product ）和numpy.meshgrid是最常见的方式。

但是，我找不到关于如何随机生成这种组合的任何讨论，而不重复。

一个简单的解决方案可能是生成所有组合，然后随机选择其中一些。 例如：

# Three random combinations of [0,0,0] and [1,1,1]
comb = np.array(np.meshgrid([0,1],[0,1],[0,1])).T.reshape(-1,3)
result = comb[np.random.choice(len(comb),3,replace=False),:]

然而，当组合的数量太大时，这是不可行的。

有没有办法生成随机组合而无需在Python中替换（可能使用Numpy）而不生成所有组合？

编辑：您可以在接受的答案中注意到，我们也免费获得了一种生成随机二进制向量而无需重复的技术 ，这只是一条线（在红利部分中描述）。

Answer 1

这是一个没有生成所有组合的矢量化方法 -

def unique_combs(A, N):
    # A : 2D Input array with each row representing one group
    # N : No. of combinations needed
    m,n = A.shape
    dec_idx = np.random.choice(2**m,N,replace=False)
    idx = ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
    return  A[np.arange(m),idx]

请注意，这假设我们处理的是每组相同数量的元素。

说明

为了给它一些解释，让我们说这些组存储在一个2D数组中 -

In [44]: A
Out[44]: 
array([[4, 2],   <-- group #1
       [3, 5],   <-- group #2
       [8, 6]])  <-- group #3

我们每组有两个元素。 假设我们正在寻找4独特的组合组合： N = 4 。 要从这三个组中的每个组中选择两个数字，我们将总共有8唯一组合。

让我们使用np.random.choice(8, N, replace=False)在8间隔中生成N唯一数字 -

In [86]: dec_idx = np.random.choice(8,N,replace=False)

In [87]: dec_idx
Out[87]: array([2, 3, 7, 0])

然后，将它们转换为二进制等价物，稍后我们需要将它们索引到A每一行 -

In [88]: idx = ((dec_idx[:,None] & (1 << np.arange(3)))!=0).astype(int)

In [89]: idx
Out[89]: 
array([[0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [0, 0, 0]])

最后，通过花式索引，我们将这些元素从A删除 -

In [90]: A[np.arange(3),idx]
Out[90]: 
array([[4, 5, 8],
       [2, 5, 8],
       [2, 5, 6],
       [4, 3, 8]])

样品运行

In [80]: # Original code that generates all combs
    ...: comb = np.array(np.meshgrid([4,2],[3,5],[8,6])).T.reshape(-1,3)
    ...: result = comb[np.random.choice(len(comb),4,replace=False),:]
    ...: 

In [81]: A = np.array([[4,2],[3,5],[8,6]]) # 2D array of groups

In [82]: unique_combs(A, 3) # 3 combinations
Out[82]: 
array([[2, 3, 8],
       [4, 3, 6],
       [2, 3, 6]])

In [83]: unique_combs(A, 4) # 4 combinations
Out[83]: 
array([[2, 3, 8],
       [4, 3, 6],
       [2, 5, 6],
       [4, 5, 8]])

奖金部分

关于((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int) ：

该步骤基本上是将十进制数转换为二进制数。 让我们把它分解成更小的步骤，仔细看看。

1）输入十进制数组 -

In [18]: dec_idx
Out[18]: array([7, 6, 4, 0])

2）插入新轴时使用None/np.newaxis转换为2D -

In [19]: dec_idx[:,None]
Out[19]: 
array([[7],
       [6],
       [4],
       [0]])

3）假设m = 3 ，即我们想要转换为3个二进制数字当量。

我们使用位移操作创建2-powered范围阵列 -

In [16]: (1 << np.arange(m))
Out[16]: array([1, 2, 4])

或者，明确的方式是 -

In [20]: 2**np.arange(m)
Out[20]: array([1, 2, 4])

4）现在，那里的神秘步骤的关键。 我们在2D dec_idx和2-powered范围阵列之间执行broadcasted按位AND-ind。

考虑dec_idx ： 7的第一个元素。 我们正在执行bitiwse AND-ING的7对1 ， 2 ， 4 。 把它看成是一个滤波过程，正如我们过滤7在每个二进制间隔1 ， 2 ， 4 ，因为它们表示三个二进制数位。 类似地，我们通过broadcasting以矢量化的方式对dec_idx中的所有dec_idx执行此操作。

因此，我们会得到像这样的逐位AND结果 -

In [43]: (dec_idx[:,None] & (1 << np.arange(m)))
Out[43]: 
array([[1, 2, 4],
       [0, 2, 4],
       [0, 0, 4],
       [0, 0, 0]])

由此获得的滤波数字本身是0或2-powered范围数组编号。 因此，要获得二进制等价物，我们只需要将所有非零视为1s和零视为0s 。

In [44]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0)
Out[44]: 
array([[ True,  True,  True],
       [False,  True,  True],
       [False, False,  True],
       [False, False, False]], dtype=bool)

In [45]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
Out[45]: 
array([[1, 1, 1],
       [0, 1, 1],
       [0, 0, 1],
       [0, 0, 0]])

因此，我们在右边有MSB的二进制数。

使用Numpy生成两个数组的随机组合而不重复

问题描述

1 个解决方案

解决方案1
6 已采纳 2016-12-19 07:05:39

使用Numpy生成两个数组的随机组合而不重复

问题描述

1 个解决方案

解决方案1 6 已采纳 2016-12-19 07:05:39

解决方案1
6 已采纳 2016-12-19 07:05:39