简体   繁体   中英

MATLAB: sample from population randomly many times?

I am aware of MATLAB's datasample which allows to select k times from a certain population . Suppose population=[1,2,3,4] and I want to uniformly sample, with replacement, k=5 times from it. Then:

datasample(population,k)
ans =
   1     3     2     4     1

Now, I want to repeat the above experiment N=10000 times without using a for loop. I tried doing:

datasample(repmat(population,N,1),5,2)

But the output I get is (just a short excerpt below):

 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3
 1     3     2     1     3

Every row (result of an experiment) is the same! But obviously they should be different... It's as though some random seed is not updating between rows. How can I fix this? Or some other method I could use that avoids a for loop? Thanks!

You seem to be confusing the way datasample works. If you read the documentation on the function, if you specify a matrix, it will generate a data sampling from a selection of rows in the matrix. Therefore, if you simply repeat the population vector 10000 times, and when you specify the second parameter of the function - which in this case is how many rows of the matrix to extract, even though the actual row locations themselves are different, the actual rows over all of the matrix is going to be the same which is why you are getting that "error".

As such, I wouldn't use datasample here if it is your intention to avoid looping. You can use datasample , but you'd have to loop over each call and you explicitly said that this is not what you want.

What I would recommend you do is first create your population vector to have whatever you desire in it, then generate a random index matrix where each value is between 1 up to as many elements as there are in population . This matrix is in such a way where the number of columns is the number of samples and the number of rows is the number of trials. Once you create this matrix, simply use this to index into your vector to achieve the desired sampling matrix. To generate this random index matrix, randi is a fine choice.

Something like this comes to mind:

N = 10000; %// Number of trials
M = 5; %// Number of samples per trial
population = 1:4; %// Population vector

%// Generate random indices
ind = randi(numel(population), N, M);

%// Get the stuff
out = population(ind);

Here's the first 10 rows of the output:

>> out(1:10,:)

ans =

     4     3     1     4     2
     4     4     1     3     4
     3     2     2     2     3
     1     4     2     2     2
     1     2     3     4     2
     2     2     3     2     1
     4     1     3     2     4
     1     4     1     3     1
     1     1     2     4     4
     1     2     4     2     1

I think the above does what you want. Also keep in mind that the above code generalizes to any population vector you want. You simply have to change the vector and it will work as advertised.

datasample interprets each column of your data as one element of your population, sampling among all columns.

To fix this you could call datasample N times in a loop, instead I would use randi

population(randi(numel(population),N,5))

assuming your population is always 1:p, you could simplify to:

randi(p,N,5)

Ok so both of the current answers both say don't use datasample and use randi instead. However, I have a solution for you with datasample and arrayfun .

>> population = [1 2 3 4];
>> k = 5; % Number of samples
>> n = 1000; % Number of times to execute datasample(population, k)
>> s = arrayfun(@(k) datasample(population, k), n*ones(k, 1), 'UniformOutput', false);
>> s = cell2mat(s);
s =

     1     4     1     4     4
     4     1     2     2     4
     2     4     1     2     1
     1     4     3     3     1
     4     3     2     3     2

We need to make sure to use 'UniformOutput', false with arrayfun as there is more than one output. The cell2mat call is needed as the result of arrayfun is a cell array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM