简体   繁体   English

从 numpy 数组的每一列中选择随机元素

[英]selecting random elements from each column of numpy array

I have an n row, m column numpy array, and would like to create a new kxm array by selecting k random elements from each column of the array.我有一个 n 行 m 列 numpy 数组,并且想通过从数组的每一列中选择 k 个随机元素来创建一个新的 kxm 数组。 I wrote the following python function to do this, but would like to implement something more efficient and faster:我写了以下 python function 来做到这一点,但想实现一些更有效和更快的东西:

def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
    v = np.ndarray.tolist(v)
    subv = random.sample(v, nelements)
    vmat = vmat + [subv]
return(np.array(vmat).T) 

One question is whether there's a way to loop over each column without transposing the array (and then transposing back).一个问题是是否有一种方法可以循环遍历每一列而不转置数组(然后转回)。 More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns?更重要的是,是否有某种方法可以将随机样本放到每一列上,这比在所有列上使用 for 循环更快? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?我对 numpy 对象没有太多经验,但我想应该有一些类似于应用/映射的东西 R 可以工作?

To sample each column without replacement just like your original solution就像您原来的解决方案一样,无需更换即可对每根色谱柱进行采样

import numpy as np

matrix = np.arange(4*3).reshape(4,3)
matrix

Output Output

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)

Output Output

array([[ 9,  1,  2],
       [ 3,  4, 11]])

One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:一种替代方法是先随机生成索引,然后使用take_along_axis到 map 到原始数组:

arr = np.random.randn(1000, 5000)  # arbitrary
k = 10  # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (m, k))
new = np.take_along_axis(arr, idx, axis=0)

Output (shape): Output(形状):

in [215]: new.shape    
out[215]: (10, 500)  # (k x m)

I would我会

  1. Pre-allocate the result array, and fill in columns, and预分配结果数组,并填充列,以及
  2. Use numpy index based indexing使用基于 numpy 索引的索引
def sample_array_cols(matrix, n_result):
    (n,m) = matrix.shape
    vmat = numpy.array([n_result, m], dtype= matrix.dtype)
    for c in range(m):
        random_indices = numpy.random.randint(0, n, n_result)
        vmat[:,c] = matrix[random_indices, c]
    return vmat

Not quite fully vectorized, but better than building up a list, and the code scans just like your description.不是完全矢量化,但比建立一个列表更好,并且代码扫描就像你的描述一样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM