简体   繁体   中英

How do I find the number of complete cases in a data frame and produce a new data frame with only subtotals for a specified value of a column using R?

Here is the function I have constructed that will attempt to produce a data frame with the sum of complete cases for a specific level or levels of the "ID" variable. It works when I have only one value for id. However, when I input more than one id value, it sums all the complete cases. The new data frame "out" has each id value listed with the sum of the complete cases in each corresponding column:

complete_cases<-function(directory,id=1:332){
files_list<-list.files(directory,full.names=TRUE)
dat<-data.frame()
s<-vector()
for(i in 1:332){
dat<-rbind(dat,read.csv(files_list[i]))
} 
dat_subset<-dat[which(dat[,"ID"]%in%id),]
s<-sum(complete.cases(dat_subset))
out<-data.frame(cbind(id,nobs=s))   
return(out)
}

The output for id=1:2 is:

> complete_cases("specdata",1:2)
id nobs
1  1 1158
2  2 1158

If I understand correctly, you want the output to be a data.frame with the number of complete cases for each of the ids passed into your function. This is one way:

# Sample data
mat <- matrix(rnorm(50000), nrow = 5000)
mat[cbind(sample(5000, 500, replace = TRUE), sample(10, 500, replace = TRUE))] <- rep(NA, 500)
df <- data.frame(id = sample(332, 5000, replace = TRUE), mat)

plyr::ddply(df, .(id), function(x) c(CompleteCases = sum(complete.cases(x))))

Adapting your code:

complete_cases<-function(directory, id=1:332){
  files_list<-list.files(directory,full.names=TRUE)
  dat<- plyr::ldply(files_list, read.csv)
  dat_subset<-dat[which(dat$ID %in% id), ]
  plyr::ddply(dat, .(ID), function(x) data.frame(nobs = sum(complete.cases(x))))
}

Note this will have a column called ID rather than id in the output as per the original data. You could use plyr::rename to change it to id if needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM