简体   繁体   English

从单个数据集中循环R个多个样本

[英]loop R multiple samples from single dataset

I am attempting to create a simple loop in R, where I have a large dataset and I want to create multiple smaller samples from this dataset and export them to excel: 我正在尝试在R中创建一个简单的循环,这里有一个大数据集,我想从该数据集中创建多个较小的样本并将其导出到excel:

I thought it would work like this, but it doesn't: 我以为它会像这样工作,但不会:

 idorg <- c(1,2,3,4,5)
 x <- c(14,20,21,16,17)
 y <- c(31,21,20,50,13)
 dataset <- cbind (idorg,x,y)


 for (i in 1:4)
 {
 attempt[i] <- dataset[sample(1:nrow(dataset), 3, replace=FALSE),]
 write.table(attempt[i], "C:/Users/me/Desktop/WWD/Excel/dataset[i].xls", sep='\t')
 }

In Stata you would need to preserve and restore your data when doing a loop like this, but is this also necessary in R? 在Stata中,执行这样的循环时需要保留和恢复数据,但这在R中是否也是必要的?

You have following problems: 您有以下问题:

  1. attempt is not declared, so attempt[i] cannot be assigned to. 未声明try,因此无法分配attempt[i] Either make it a matrix to fill up within the loop (if you want to keep the samples), or use it as a temporary variable attempt . 可以将其作为要填充到循环中的矩阵(如果要保留样本),也可以将其用作临时变量attempt
  2. The file name is take literary, you need to use paste() or sprintf() to include the value of the variable i in the file name. 文件名是文学的,您需要使用paste()sprintf()在文件名中包含变量i的值。

Here is a working version of the code: 这是代码的有效版本:

idorg <- c(1,2,3,4,5)
x <- c(14,20,21,16,17)
y <- c(31,21,20,50,13)
dataset <- cbind (idorg,x,y)

for (i in 1:4)  {
  attempt <- dataset[sample(1:nrow(dataset), 3, replace=FALSE),]
  write.table(attempt, sprintf( "C:/Users/me/Desktop/WWD/Excel/dataset[%d].xls", i ), sep='\t')
}

Will Excel be able to read such a tab-separated table? Excel能够读取以制表符分隔的表格吗? I'm not sure; 我不确定; I would make a comma separated table and save it as .csv . 我将使用逗号分隔的表格并将其另存为.csv

Unlike Stata, you don't need to preserve and restore your data for this kind of operation in R. 与Stata不同,R中的这种操作不需要保留和还原数据。

I think January's solution solves your problem, but I wanted to share another alternative: using lapply() to get a list of all the samples of the dataset: 我认为January的解决方案可以解决您的问题,但是我想分享另一个选择:使用lapply()获取数据集所有样本的列表:

set.seed(1) # So you can reproduce these results
temp <- setNames(lapply(1:4,
                        function(x) { 
                          x <- dataset[sample(1:nrow(dataset),
                                              3, replace = FALSE), ]; x }),
                 paste0("attempt.", 1:4))

This has created a list() named "temp" that comprises four data.frame s. 这创建了一个名为“ temp”的list() ,它包含四个data.frame

temp
# $attempt.1
#      idorg  x  y
# [1,]     2 20 21
# [2,]     5 17 13
# [3,]     4 16 50
# 
# $attempt.2
#      idorg  x  y
# [1,]     5 17 13
# [2,]     1 14 31
# [3,]     3 21 20
# 
# $attempt.3
#      idorg  x  y
# [1,]     5 17 13
# [2,]     3 21 20
# [3,]     2 20 21
# 
# $attempt.4
#      idorg  x  y
# [1,]     1 14 31
# [2,]     5 17 13 
# [3,]     4 16 50

Lists are very convenient in R. You can now use lapply() to do other fun things, like if you wanted to find out the row sums, you can do lapply(temp, rowSums) . 列表在R中非常方便。您现在可以使用lapply()做其他有趣的事情,例如,如果您想找lapply(temp, rowSums)总和,可以执行lapply(temp, rowSums) Or, if you wanted to output separate CSV files (readable by Excel), you can do something like this: 或者,如果您要输出单独的CSV文件(可由Excel读取),则可以执行以下操作:

lapply(names(temp), function(x) write.csv(temp[[x]],
                             file = paste0(x, ".csv")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM