[英]Repeat random sampling of rows
I have a dataframe containing 2 columns: column 1 are ID's and column 2 are values associated with each ID (totalling 59 different rows). 我有一个包含2列的数据框:第1列是ID,第2列是与每个ID相关联的值(总共59个不同的行)。
Example: 例:
[ID] [value]
[1] a 164
[2] b 167
[3] c 120
[4] d 117
[5] e 106
I am assuming that the only way I can randomly sample from column 1 and keep the associated value in column 2, is by sampling rows. 我假设我可以从第1列中随机抽样并将关联值保留在第2列中的唯一方法是对行进行抽样。 I need to randomly sample 50 x 1 row, 50 x 2 rows, 50 x 3 rows, 50 x 4 rows etc. up to 59 rows. 我需要随机采样50 x 1行,50 x 2行,50 x 3行,50 x 4行等,最多59行。 Ideally, with each sample set output as a dataframe. 理想情况下,每个样本集输出都作为数据框。 So, I would end up with 59 sets of randomly sampled data. 因此,我最终将获得59组随机采样的数据。 Essentially this is the same as creating random subsets of data. 本质上,这与创建数据的随机子集相同。
I have this code which produces a df of 10 randomly sampled rows for example. 我有这段代码,例如,它会产生10个随机采样行的df。
sample_df<-df[sample.int(nrow(df),size=10,replace=TRUE),]
The question is how can I adjust this code so that it produces 50 times 10 random rows? 问题是如何调整此代码,以使其产生10次随机行的50倍? Should I be using a loop to generate all of the random samples that I need? 我应该使用循环来生成所需的所有随机样本吗?
您可以使用lapply
,这将返回数据帧列表:
lapply(1:59, function(x) df[sample(nrow(df), size = x, replace = TRUE),])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.