简体   繁体   English

重复随机采样行

[英]Repeat random sampling of rows

I have a dataframe containing 2 columns: column 1 are ID's and column 2 are values associated with each ID (totalling 59 different rows). 我有一个包含2列的数据框:第1列是ID,第2列是与每个ID相关联的值(总共59个不同的行)。

Example: 例:

     [ID] [value] 
[1]   a   164  
[2]   b   167  
[3]   c   120  
[4]   d   117  
[5]   e   106 

I am assuming that the only way I can randomly sample from column 1 and keep the associated value in column 2, is by sampling rows. 我假设我可以从第1列中随机抽样并将关联值保留在第2列中的唯一方法是对行进行抽样。 I need to randomly sample 50 x 1 row, 50 x 2 rows, 50 x 3 rows, 50 x 4 rows etc. up to 59 rows. 我需要随机采样50 x 1行,50 x 2行,50 x 3行,50 x 4行等,最多59行。 Ideally, with each sample set output as a dataframe. 理想情况下,每个样本集输出都作为数据框。 So, I would end up with 59 sets of randomly sampled data. 因此,我最终将获得59组随机采样的数据。 Essentially this is the same as creating random subsets of data. 本质上,这与创建数据的随机子集相同。

I have this code which produces a df of 10 randomly sampled rows for example. 我有这段代码,例如,它会产生10个随机采样行的df。

sample_df<-df[sample.int(nrow(df),size=10,replace=TRUE),]

The question is how can I adjust this code so that it produces 50 times 10 random rows? 问题是如何调整此代码,以使其产生10次随机行的50倍? Should I be using a loop to generate all of the random samples that I need? 我应该使用循环来生成所需的所有随机样本吗?

您可以使用lapply ,这将返回数据帧列表:

lapply(1:59, function(x) df[sample(nrow(df), size = x, replace = TRUE),])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM