高效的计数向量采样方法，无需替换

Question

Here I represent a jar of marbles using a vector of color frequencies 在这里，我使用颜色频率矢量表示一罐大理石

marbleCounts <- c(red = 5, green = 3, blue = 2)
marbleCounts

red green  blue 
  5     3     2

Now, I'd like to sample 5 marbles from this vector without replacement. 现在，我想从此向量中采样5个大理石，而无需替换。 I can do this by expanding my vector of frequencies into a vector of marbles and then sampling from it. 通过将频率向量扩展为大理石向量，然后从中进行采样，可以做到这一点。

set.seed(2019)
marbles <- rep(names(marbleCounts), times = marbleCounts)
samples <- sample(x = marbles, size = 5, replace = FALSE)
table(samples)

green   red 
    2     3

but this is memory inefficient (and perhaps performance inefficient?). 但这是内存效率低下（也许是性能低下吗？）。 Is there a faster and/or more efficient way to sample data like this? 是否有一种更快和/或更有效的方式来采样数据？

Answer 1

I think this will work for you. 我认为这对您有用。

marbleCounts <- c(red = 5, green = 3, blue = 2)

# first, draw from the possible indexes (does not create the full vector)
draw <- sample.int(sum(marbleCounts), 5)

# then assign indexes back to original group
items <- findInterval(draw-1, c(0, cumsum(marbleCounts)), rightmost.closed = TRUE)

#extract your sample    
obs <- names(marbleCounts)[items]
table(obs)

This will never create a vector longer than your sample size. 这将永远不会创建超过样本大小的向量。

高效的计数向量采样方法，无需替换

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-04-02 19:27:00

高效的计数向量采样方法，无需替换

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-04-02 19:27:00

解决方案1
4 已采纳 2019-04-02 19:27:00