简体   繁体   English

在Matlab中随机选择大量样本的子集

[英]Selecting randomly a subset of a large number samples in Matlab

I would like to create a subset of n samples from a large number of N samples, n << N. I usually use randperm function in Matlab and take the first n indices . 我想从大量的N个样本(n << N)中创建n个样本的子集。我通常在Matlab中使用randperm函数并采用前n个索引。 However, since the data can be very large, randperm give me error message of not enough memory. 但是,由于数据可能非常大,因此randperm给我错误消息,指出内存不足。

I would like to have suggestions, how can I select a small subset out of a large number of data set without using randperm function in Matlab. 我想提出一些建议,如何在不使用Matlab中的randperm函数的情况下从大量数据集中选择一小部分。

Thank you. 谢谢。

You could try to use single() to halve the datasize: 您可以尝试使用single()将数据大小减半:

http://www.mathworks.de/help/matlab/ref/single.html http://www.mathworks.de/help/matlab/ref/single.html

randi gives uniformly distributed numbers, randi给出均匀分布的数字,

Ind = randi(N,[n 1]);
Observation = data(Ind); 

You could also use datasample , 您还可以使用datasample

Observation = datasample(data,n,'Replace',false));

As @Shai mentioned, another option would be randsample , 如@Shai所述,另一种选择是randsample

Observation = data(randsample(N,n));

If n is much smaller than N , a rejection method is efficient: generate a sample with possible repetitions using randi , check if there are repetitions (which is not very likely), and if so repeat: 如果n远小于N ,则拒绝方法是有效的:使用randi生成具有可能重复的样本,检查是否存在重复(可能性很小),如果重复,则重复:

N = 10000;
n = 100;
repeat = true;
while repeat
    sample = randi(N,1,n);
    repeat = any(sum(bsxfun(@eq, sample, sample.'))>1);
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM