[英]How do I find the number of complete cases in a data frame and produce a new data frame with only subtotals for a specified value of a column using R?
Here is the function I have constructed that will attempt to produce a data frame with the sum of complete cases for a specific level or levels of the "ID" variable.这是我构建的函数,它将尝试生成一个数据框,其中包含特定级别或“ID”变量级别的完整案例的总和。 It works when I have only one value for id.当我只有一个 id 值时它有效。 However, when I input more than one id value, it sums all the complete cases.但是,当我输入多个 id 值时,它会汇总所有完整案例。 The new data frame "out" has each id value listed with the sum of the complete cases in each corresponding column:新数据框“out”列出了每个 id 值以及每个对应列中完整案例的总和:
complete_cases<-function(directory,id=1:332){
files_list<-list.files(directory,full.names=TRUE)
dat<-data.frame()
s<-vector()
for(i in 1:332){
dat<-rbind(dat,read.csv(files_list[i]))
}
dat_subset<-dat[which(dat[,"ID"]%in%id),]
s<-sum(complete.cases(dat_subset))
out<-data.frame(cbind(id,nobs=s))
return(out)
}
The output for id=1:2 is: id=1:2 的输出是:
> complete_cases("specdata",1:2)
id nobs
1 1 1158
2 2 1158
If I understand correctly, you want the output to be a data.frame with the number of complete cases for each of the ids passed into your function.如果我理解正确,您希望输出是一个 data.frame,其中包含传递给函数的每个 id 的完整案例数。 This is one way:这是一种方式:
# Sample data
mat <- matrix(rnorm(50000), nrow = 5000)
mat[cbind(sample(5000, 500, replace = TRUE), sample(10, 500, replace = TRUE))] <- rep(NA, 500)
df <- data.frame(id = sample(332, 5000, replace = TRUE), mat)
plyr::ddply(df, .(id), function(x) c(CompleteCases = sum(complete.cases(x))))
Adapting your code:调整您的代码:
complete_cases<-function(directory, id=1:332){
files_list<-list.files(directory,full.names=TRUE)
dat<- plyr::ldply(files_list, read.csv)
dat_subset<-dat[which(dat$ID %in% id), ]
plyr::ddply(dat, .(ID), function(x) data.frame(nobs = sum(complete.cases(x))))
}
Note this will have a column called ID rather than id in the output as per the original data.请注意,根据原始数据,这将在输出中有一列名为 ID 而不是 id 。 You could use plyr::rename to change it to id if needed.如果需要,您可以使用 plyr::rename 将其更改为 id 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.