[英]Most efficient way to randomize a matrix in R or in Python
I'm working with a numeric matrix M in R which is quite big (11000 rows per 20 columns). 我正在使用一个很大的R中的数值矩阵M (每20列11000行)。 On this matrix, I'm performing a lot of correlation tests 在此矩阵上,我正在执行很多相关性测试
=> the function cor.test(M[i,], M[j,], method='spearman')
where i and j are two rows from the matrix (all possible combinations are tested). =>函数cor.test(M[i,], M[j,], method='spearman')
,其中i和j是矩阵的两行(测试了所有可能的组合)。
The problem as you know is that I'm doing too many tests to get a very reliable p-value returned by this test. 如您所知,问题是我进行了太多测试,无法获得此测试返回的非常可靠的p值。
My strategy to overcome this limitation would be to generate a new probability distribution by Bootstrap on my matrix M: I would like to get 100 random matrices generated from M to do the multiple correlations on these matrices and choose the right cut-off for the p-value to get a FDR of 5%. 我克服这一限制的策略是通过Bootstrap在矩阵M上生成新的概率分布:我想从M生成100个随机矩阵,以对这些矩阵进行多重相关,并为p选择正确的截止值。值以获得5%的FDR。
My question is: 我的问题是:
Thank you in advance for all the usefull answers that you'll provide to me. 预先感谢您提供给我的所有有用的答案。
In python there is a function random.sample()
in module random
. 在python中,模块random
有一个random.sample()
函数。 If you store M as list of rows, randomly sampling n
rows from matrix M
without replacement would be like this 如果将M存储为行列表,则从矩阵M
随机采样n
行而不进行替换将像这样
M_sample = random.sample(M,n)
However, for bootstrapping, you might want to do random sampling with replacement. 但是,对于自举,您可能需要进行随机抽样和替换。 To do this, you can use numpy.random.choice()
: 为此,您可以使用numpy.random.choice()
:
import numpy
M_sample = numpy.random.choice(M,n,replace=True)
In R, we use sample()
to randomly decide the row indices to take, and then use row access to take the rows from the matrices. 在R中,我们使用sample()
随机决定要采用的行索引,然后使用行访问从矩阵中获取行。 Randomly sampling n
rows from matrix M
without replacement is done as follows: 从矩阵M
随机采样n
行而不进行替换如下:
indices = sample(nrow(M), n,replace=FALSE)
M_sample = M[indices, ]
And for randomly sampling with replacement, replace the first line with this: 对于要替换的随机抽样,请用以下内容替换第一行:
indices = sample(nrow(M), n,replace=TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.