简体   繁体   English

使用Numpy生成两个数组的随机组合而不重复

[英]Using Numpy to generate random combinations of two arrays without repetition

Given two arrays, for example [0,0,0] and [1,1,1] , it is already clear (see here ) how to generate all the combinations, ie, [[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]] . 给定两个数组,例如[0,0,0][1,1,1] ,已经很清楚(见这里 )如何生成所有组合,即[[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]] itertools ( combinations or product ) and numpy.meshgrid are the most common ways as far as I know. 据我所知, itertoolscombinationsproduct )和numpy.meshgrid是最常见的方式。

However, I could't find any discussions on how to generate this combinations randomly, without repetitions. 但是,我找不到关于如何随机生成这种组合的任何讨论,而不重复。

An easy solution could be to generate all the combinations and then choose some of them randomly. 一个简单的解决方案可能是生成所有组合,然后随机选择其中一些。 For example: 例如:

# Three random combinations of [0,0,0] and [1,1,1]
comb = np.array(np.meshgrid([0,1],[0,1],[0,1])).T.reshape(-1,3)
result = comb[np.random.choice(len(comb),3,replace=False),:]

However, this is infeasible when the number of combinations is too big. 然而,当组合的数量太大时,这是不可行的。

Is there a way to generate random combinations without replacement in Python (possibly with Numpy) without generating all the combinations? 有没有办法生成随机组合而无需在Python中替换(可能使用Numpy)而不生成所有组合?

EDIT : You can notice in the accepted answer that we also got for free a technique to generate random binary vectors without repetitions , which is just a single line (described in the Bonus Section ). 编辑 :您可以在接受的答案中注意到,我们也免费获得了一种生成随机二进制向量而无需重复的技术 ,这只是一条线(在红利部分中描述)。

Here's a vectorized approach without generating all combinations - 这是一个没有生成所有组合的矢量化方法 -

def unique_combs(A, N):
    # A : 2D Input array with each row representing one group
    # N : No. of combinations needed
    m,n = A.shape
    dec_idx = np.random.choice(2**m,N,replace=False)
    idx = ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
    return  A[np.arange(m),idx]

Please note that this assumes we are dealing with equal number of elements per group. 请注意,这假设我们处理的是每组相同数量的元素。

Explanation 说明

To give it a bit of explanation, let's say the groups are stored in a 2D array - 为了给它一些解释,让我们说这些组存储在一个2D数组中 -

In [44]: A
Out[44]: 
array([[4, 2],   <-- group #1
       [3, 5],   <-- group #2
       [8, 6]])  <-- group #3

We have two elems per group. 我们每组有两个元素。 Let's say we are looking for 4 unique group combinations : N = 4 . 假设我们正在寻找4独特的组合组合: N = 4 To select from two numbers from each of those three groups, we would have a total of 8 unique combinations. 要从这三个组中的每个组中选择两个数字,我们将总共有8唯一组合。

Let's generate N unique numbers in that interval of 8 using np.random.choice(8, N, replace=False) - 让我们使用np.random.choice(8, N, replace=False)8间隔中生成N唯一数字 -

In [86]: dec_idx = np.random.choice(8,N,replace=False)

In [87]: dec_idx
Out[87]: array([2, 3, 7, 0])

Then, convert those to binary equivalents as later on we need those to index into each row of A - 然后,将它们转换为二进制等价物,稍后我们需要将它们索引到A每一行 -

In [88]: idx = ((dec_idx[:,None] & (1 << np.arange(3)))!=0).astype(int)

In [89]: idx
Out[89]: 
array([[0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [0, 0, 0]])

Finally, with fancy-indexing, we get those elems off A - 最后,通过花式索引,我们将这些元素从A删除 -

In [90]: A[np.arange(3),idx]
Out[90]: 
array([[4, 5, 8],
       [2, 5, 8],
       [2, 5, 6],
       [4, 3, 8]])

Sample run 样品运行

In [80]: # Original code that generates all combs
    ...: comb = np.array(np.meshgrid([4,2],[3,5],[8,6])).T.reshape(-1,3)
    ...: result = comb[np.random.choice(len(comb),4,replace=False),:]
    ...: 

In [81]: A = np.array([[4,2],[3,5],[8,6]]) # 2D array of groups

In [82]: unique_combs(A, 3) # 3 combinations
Out[82]: 
array([[2, 3, 8],
       [4, 3, 6],
       [2, 3, 6]])

In [83]: unique_combs(A, 4) # 4 combinations
Out[83]: 
array([[2, 3, 8],
       [4, 3, 6],
       [2, 5, 6],
       [4, 5, 8]])

Bonus section 奖金部分

Explanation on ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int) : 关于((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)

That step is basically converting decimal numbers to binary equivalents. 该步骤基本上是将十进制数转换为二进制数。 Let's break it down to smaller steps for a closer look. 让我们把它分解成更小的步骤,仔细看看。

1) Input array of decimal numbers - 1)输入十进制数组 -

In [18]: dec_idx
Out[18]: array([7, 6, 4, 0])

2) Convert to 2D upon inserting new axis with None/np.newaxis - 2)插入新轴时使用None/np.newaxis转换为2D -

In [19]: dec_idx[:,None]
Out[19]: 
array([[7],
       [6],
       [4],
       [0]])

3) Let's assume m = 3 , ie we want to convert to 3 binary digit number equivalents. 3)假设m = 3 ,即我们想要转换为3个二进制数字当量。

We create 2-powered range array with bit-shift operation - 我们使用位移操作创建2-powered范围阵列 -

In [16]: (1 << np.arange(m))
Out[16]: array([1, 2, 4])

Alternatively, an explicit way would be - 或者,明确的方式是 -

In [20]: 2**np.arange(m)
Out[20]: array([1, 2, 4])

4) Now, the crux of the cryptic step there. 4)现在,那里的神秘步骤的关键。 We perform broadcasted bitwise AND-ind between 2D dec_idx and 2-powered range array. 我们在2D dec_idx2-powered范围阵列之间执行broadcasted按位AND-ind。

Consider the first element from dec_idx : 7 . 考虑dec_idx7的第一个元素。 We are performing bitiwse AND-ing of 7 against 1 , 2 , 4 . 我们正在执行bitiwse AND-ING的7124 Think of it as a filtering process, as we filter 7 at each binary interval of 1 , 2 , 4 as they represent the three binary digits. 把它看成是一个滤波过程,正如我们过滤7在每个二进制间隔124 ,因为它们表示三个二进制数位。 Similarly, we do this for all elems off dec_idx in a vectorized manner with broadcasting . 类似地,我们通过broadcasting以矢量化的方式对dec_idx中的所有dec_idx执行此操作。

Thus, we would get the bit-wise AND-ing results like so - 因此,我们会得到像这样的逐位AND结果 -

In [43]: (dec_idx[:,None] & (1 << np.arange(m)))
Out[43]: 
array([[1, 2, 4],
       [0, 2, 4],
       [0, 0, 4],
       [0, 0, 0]])

The filtered numbers thus obtained are either 0 or the 2-powered range array numbers themselves. 由此获得的滤波数字本身是02-powered范围数组编号。 So, to have the binary equivalents, we just need to consider all non-zeros as 1s and zeros as 0s . 因此,要获得二进制等价物,我们只需要将所有非零视为1s和零视为0s

In [44]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0)
Out[44]: 
array([[ True,  True,  True],
       [False,  True,  True],
       [False, False,  True],
       [False, False, False]], dtype=bool)

In [45]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
Out[45]: 
array([[1, 1, 1],
       [0, 1, 1],
       [0, 0, 1],
       [0, 0, 0]])

Thus, we have the binary numbers with MSBs to the right. 因此,我们在右边有MSB的二进制数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM