简体   繁体   English

随着R中样本数量的增加,使用sample()无需多次替换

[英]Use sample() without replacement multiple times with increasing sample size in R

I want to take "random" samples from a vector called data but with increasing size and without replacement. 我想从称为data的向量中获取“随机”样本,但其size增加并且无法替换。

To illustrate my point data looks for example like: 为了说明我的点data ,例如:

data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")

What I need is to get different sampling vectors with increasing sampling size (starting with size=2) for example by 2 but without duplicates between the different vectors and store everything into a list so that the result would look something like this: 我需要的是获得不同的采样向量,并以增加的采样大小(从size = 2开始)为例,例如增加2,但在不同的向量之间不重复,并将所有内容存储在列表中,以便结果看起来像这样:

sample_1<-c("s","d")
sample_2<-c("s","d","a","f")
sample_3<-c("s","d","a","f","m","n")
sample_4<-c("s","d","a","f","m","n","l","c")
sample_5<-c("s","d","a","f","m","n","l","c","j","x")
sample_6<-c("s","d","a","f","m","n","l","c","j","x","v","k")
sample_7<-c("s","d","a","f","m","n","l","c","j","x","v","k","g","b")
sample_8<-c("s","d","a","f","m","n","l","c","j","x","v","k","g","b","h")
samples<-list(sample_1,sample_2,sample_3,sample_4,sample_5,sample_6,sample_7,sample_8)

What i have so far is: 到目前为止,我有:

samples<-sapply(seq(from=2, to=length(data), by=2), function(i) sample(data,size=i,replace=F),simplify=F,USE.NAMES=T )

What does not work is to have the increasing sample size but keeping the samples of the previous steps and to have a last list element with all observations. 不可行的是增加样本量,但保留先前步骤的样本,并在所有观察结果中保留最后一个列表元素。 Is something like this possible? 这样的事情可能吗?

I'm not sure whether I understood you correctly, but perhaps you only need to scramble the data once: 我不确定我是否正确理解您,但是也许您只需要对数据进行一次加密:

data = letters
data_random = sample(data)
sapply(seq(from=2, to=length(data), by=2),
       function (x) data_random[1:x],
       simplify = FALSE)

After your comments on other answer I think I get what you want to achieve, so extending my previous code I end up with: 在您对其他答案发表评论之后,我想我就知道了您想要实现的目标,因此扩展我以前的代码,最终得到:

data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
set.seed(123)
nbitems=length(data)/2+length(data)%%2
results=vector("list",nbitems)

results[[1]] <- sample(data,2) # get first sample
for (i in 2:nbitems) { # Loop for each result
  samplesavail <- data[!data %in% results[[i-1]]] # Reduce the samples available
  results[[i]] <- c(results[[i-1]], sample( samplesavail, min( length(samplesavail), 2) ) ) # concatenate a new sample, size depends on step and remaining samples available.
}

Hope this match your intended use: 希望这符合您的预期用途:

> results
[[1]]
[1] "n" "f"

[[2]]
[1] "n" "f" "a" "g"

[[3]]
[1] "n" "f" "a" "g" "m" "v"

[[4]]
[1] "n" "f" "a" "g" "m" "v" "x" "l"

[[5]]
 [1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j"

[[6]]
 [1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h"

[[7]]
 [1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h" "d" "s"

[[8]]
 [1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h" "d" "s" "c"

Previous approach: 以前的方法:

If I understood you well (but far unsure): 如果我对您的理解很好(但不确定):

data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
set.seed(123) # fix the seed for repro of answer, remove in real case
nbitems=length(data)/2+length(data)%%2 # Get how much entries we should have when stepping by 2
results=vector("list",nbitems) # preallocate the list (as we'll start by end)
results[[nbitems]] = sample(data,length(data)) # sample the datas
for (i in nbitems:2) {
  results[[i-1]] <- results[[i]][1:(length(results[[i]]) - 2)] # for each iteration, take down the 2 last entries.
}

This give a single entry as first result. 这给出一个条目作为第一结果。

Just noticed this is the same idea as @sbstn answer but with a more complicated backward approach, posting in case it can have some value. 刚刚注意到,这与@sbstn答案是相同的主意,但采用了更为复杂的后向方法,以防万一它可以具有一定的价值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM