[英]Selecting randomly a subset of a large number samples in Matlab
I would like to create a subset of n samples from a large number of N samples, n << N. I usually use randperm function in Matlab and take the first n indices . 我想从大量的N个样本(n << N)中创建n个样本的子集。我通常在Matlab中使用randperm函数并采用前n个索引。 However, since the data can be very large, randperm give me error message of not enough memory.
但是,由于数据可能非常大,因此randperm给我错误消息,指出内存不足。
I would like to have suggestions, how can I select a small subset out of a large number of data set without using randperm function in Matlab. 我想提出一些建议,如何在不使用Matlab中的randperm函数的情况下从大量数据集中选择一小部分。
Thank you. 谢谢。
You could try to use single() to halve the datasize: 您可以尝试使用single()将数据大小减半:
http://www.mathworks.de/help/matlab/ref/single.html http://www.mathworks.de/help/matlab/ref/single.html
randi
gives uniformly distributed numbers, randi
给出均匀分布的数字,
Ind = randi(N,[n 1]);
Observation = data(Ind);
You could also use datasample
, 您还可以使用
datasample
,
Observation = datasample(data,n,'Replace',false));
As @Shai mentioned, another option would be randsample
, 如@Shai所述,另一种选择是
randsample
,
Observation = data(randsample(N,n));
If n
is much smaller than N
, a rejection method is efficient: generate a sample with possible repetitions using randi
, check if there are repetitions (which is not very likely), and if so repeat: 如果
n
远小于N
,则拒绝方法是有效的:使用randi
生成具有可能重复的样本,检查是否存在重复(可能性很小),如果重复,则重复:
N = 10000;
n = 100;
repeat = true;
while repeat
sample = randi(N,1,n);
repeat = any(sum(bsxfun(@eq, sample, sample.'))>1);
end
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.