简体   繁体   中英

Simulate 5000 samples of size 5 from a normal distribution with mean 5 and standard deviation 3

I am trying to simulate 5000 samples of size 5 from a normal distribution with mean 5 and standard deviation 3. I want to then compute the mean of each sample and make a histogram of the sample means

My current code is not giving me an error but I don't think it's right:

nrSamples = 5000
e <- list(mode="vector",length=nrSamples)
for (i in 1:nrSamples) {
e[[i]] <- rnorm(n = 5, mean = 5, sd = 3)

sample_means <- matrix(NA, 5000,1)
for (i in 1:5000){
sample_means[i] <- mean(e[[i]])

Any idea on how to tackle this? I am very very new to R!

You can actually do this without for loops. replicate can be used to create the 5000 samples. Then use sapply to return the mean of each sample. Wrap the sapply call in hist() to get the histogram of means.

dat = replicate(5000, rnorm(5,5,3), simplify=FALSE)

hist(sapply(dat, mean))

Or, if you want to save the means:

sample.means = sapply(dat,mean)

I think your code is giving valid results. list(mode="vector",length=nrSamples) isn't doing what I think you intended (run it in the console and see what happens), but it works out because the first two list elements get overwritten in the loop.

Although there's no need to use loops here, just for illustration here are two modified versions of your code using loops:

# 1. Store random samples in a list
e <- vector("list", nrSamples) 
for (i in 1:nrSamples) {
  e[[i]] <- rnorm(n = 5, mean = 5, sd = 3)

sample_means = rep(NA, nrSamples)
for (i in 1:nrSamples){
  sample_means[i] <- mean(e[[i]])

# 2. Store random samples in a matrix
e <- matrix(rep(NA, 5000*5), nrow=5)
for (i in 1:nrSamples) {
  e[,i] <- rnorm(n = 5, mean = 5, sd = 3)

sample_means = rep(NA, nrSamples)
for (i in 1:nrSamples){
  sample_means[i] <- mean(e[, i])

You don't need a list in this case. It is a common mistake of new R users to use lists excessively.

observations <- matrix(rnorm(25000, mean=5, sd=3), 5000, 5)
means <- rowMeans(observations)

Now means is a vector of 5000 elements.

Your code is fine (see below), but I would suggest you try the following:

 yourlist <- lapply(1:nrSamples, function(x) rnorm(n=5, mean = 5, sd = 3 ))
 yourmeans <- sapply(yourlist, mean)

Here, for each element of the sequence 1, 2, 3, ... nrSamples that I supply as the first argument, lapply executes an function with the given element of the sequence as argument (ie x ). The function that I have supplied does not depend on x , however, so it is just replicated 5000 times, and the output is stored in a list (this is what lapply does). It is an easy way to avoid loops in situations like these. Needless to say, you could also just run

 yourmeans <- sapply(1:nrSamples, function(x) mean(rnorm(n=5, mean = 5, sd = 3))) 

Apart from the means, the latter does not store your results though, which may not be what you want. Also note that I call sapply to return a vector, which you can then use to plot your histogram, using eg hist(yourmeans) .

To show that your code is fine, consider the following:

nrSamples = 5000
e <- list(mode="vector",length=nrSamples)
for (i in 1:nrSamples) {
  e[[i]] <- rnorm(n = 5, mean = 5, sd = 3)

sample_means <- matrix(NA, 5000,1)
for (i in 1:5000){
  sample_means[i] <- mean(e[[i]])

yourlist <- lapply(1:nrSamples, function(x) rnorm(n=5, mean = 5, sd = 3 ))
yourmeans <- sapply(yourlist, mean)

all.equal(as.vector(sample_means), yourmeans)
[1] TRUE

Here, I set the seed to the random number generator to make sure that the random numbers are the same. As you see, your code works fine, though as others have pointed out, loops can easily be avoided.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM