Here I represent a jar of marbles using a vector of color frequencies
marbleCounts <- c(red = 5, green = 3, blue = 2)
marbleCounts
red green blue
5 3 2
Now, I'd like to sample 5 marbles from this vector without replacement. I can do this by expanding my vector of frequencies into a vector of marbles and then sampling from it.
set.seed(2019)
marbles <- rep(names(marbleCounts), times = marbleCounts)
samples <- sample(x = marbles, size = 5, replace = FALSE)
table(samples)
green red
2 3
but this is memory inefficient (and perhaps performance inefficient?). Is there a faster and/or more efficient way to sample data like this?
I think this will work for you.
marbleCounts <- c(red = 5, green = 3, blue = 2)
# first, draw from the possible indexes (does not create the full vector)
draw <- sample.int(sum(marbleCounts), 5)
# then assign indexes back to original group
items <- findInterval(draw-1, c(0, cumsum(marbleCounts)), rightmost.closed = TRUE)
#extract your sample
obs <- names(marbleCounts)[items]
table(obs)
This will never create a vector longer than your sample size.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.