简体   繁体   English

通过子数据迭代

[英]in R iterating through subsetted data

I am attempting an assignment for courser, so this is homework. 我正在尝试给课程生分配作业,所以这是家庭作业。 I am hoping someone will explain why what I am doing does not work. 我希望有人能解释为什么我在做什么不起作用。 I have a data frame called complete_cases and I have to report back how many records there are in specified 'sets' of observations from a much larger 'set' The data are in the format: 我有一个名为complete_cases的数据框,我必须从更大的“集合”中报告指定的“观察”集合中有多少记录。数据的格式为:

              Date sulfate nitrate ID
279 2003-10-06    7.21   0.651  1
285 2003-10-12    5.99   0.428  1
291 2003-10-18    4.68   1.040  1
297 2003-10-24    3.47   0.363  1
303 2003-10-30    2.42   0.507  1
315 2003-11-11    1.43   0.474  1

and so on for 332 different sets with the id 1 to 332. I have 'found' the instances in which the record is complete and have to return which set the data are from and how many complete sets of data there are in the specified set( by id) I am trying to use: 等等,对于ID为1到332的332个不同集合。我已经“找到”了记录已完成的实例,并且必须返回数据来自哪个集合以及指定集合中有多少个完整数据集(按ID)我正在尝试使用:

for (i in id){
   list <- nrow(complete_cases[i])
   data<-cbind(id = i,  nobs= list)
  }    

data If I call the function using one set of data, it appears to work fine: gives me: 数据如果我使用一组数据调用该函数,它似乎可以正常工作:给我以下信息:

      id nobs
[1,]  1  117

but trying to apply it to an id <- c(2,4,8,10,12) gives me the error: 但是尝试将其应用于id <-c(2,4,8,10,12)会给我错误:

Error in `[.data.frame`(complete_cases, i) : undefined columns selected

So what I was expecting is that the iteration would return the number of rows for each id in c(2,4,8,10,12) and return the id and the size for each id. 所以我期望的是,迭代将返回c(2,4,8,10,12)中每个id的行数,并返回id和每个id的大小。 Is this any clearer? 这更清楚吗?

Your problem is with the way you are subsetting the data, in order to specify that the column ID should be the one referenced by the iterator value you must be more specific. 您的问题在于子数据的设置方式,为了指定列ID应该是迭代器值所引用的ID您必须更具体。 There a number of ways to do this, here is one: 有很多方法可以做到这一点,这是一种:

complete_cases[complete_cases$ID == i, ]

You also are going to be writing over your vector every time by just using data <- ... my personal favorite, which does not require you to specify the dimension of your final set, goes like this: 您每次也将仅使用data <- ...覆盖我的向量,而我个人最喜欢的是,它不需要您指定最终集合的维,如下所示:

data_summary <- vector("list")
k <- 1
for (i in id){
   current_id_rowcount <- nrow(complete_cases[complete_cases$ID == i, ])
   data_summary[[k]] <-cbind(id = i,  nobs=current_id_rowcount)
   k <- k + 1
}    
final <- do.call(rbind, data_summary)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM