简体   繁体   English

如何找到数据框中完整案例的数量并使用 R 生成仅包含列的指定值的小计的新数据框?

[英]How do I find the number of complete cases in a data frame and produce a new data frame with only subtotals for a specified value of a column using R?

Here is the function I have constructed that will attempt to produce a data frame with the sum of complete cases for a specific level or levels of the "ID" variable.这是我构建的函数,它将尝试生成一个数据框,其中包含特定级别或“ID”变量级别的完整案例的总和。 It works when I have only one value for id.当我只有一个 id 值时它有效。 However, when I input more than one id value, it sums all the complete cases.但是,当我输入多个 id 值时,它会汇总所有完整案例。 The new data frame "out" has each id value listed with the sum of the complete cases in each corresponding column:新数据框“out”列出了每个 id 值以及每个对应列中完整案例的总和:

complete_cases<-function(directory,id=1:332){
files_list<-list.files(directory,full.names=TRUE)
dat<-data.frame()
s<-vector()
for(i in 1:332){
dat<-rbind(dat,read.csv(files_list[i]))
} 
dat_subset<-dat[which(dat[,"ID"]%in%id),]
s<-sum(complete.cases(dat_subset))
out<-data.frame(cbind(id,nobs=s))   
return(out)
}

The output for id=1:2 is: id=1:2 的输出是:

> complete_cases("specdata",1:2)
id nobs
1  1 1158
2  2 1158

If I understand correctly, you want the output to be a data.frame with the number of complete cases for each of the ids passed into your function.如果我理解正确,您希望输出是一个 data.frame,其中包含传递给函数的每个 id 的完整案例数。 This is one way:这是一种方式:

# Sample data
mat <- matrix(rnorm(50000), nrow = 5000)
mat[cbind(sample(5000, 500, replace = TRUE), sample(10, 500, replace = TRUE))] <- rep(NA, 500)
df <- data.frame(id = sample(332, 5000, replace = TRUE), mat)

plyr::ddply(df, .(id), function(x) c(CompleteCases = sum(complete.cases(x))))

Adapting your code:调整您的代码:

complete_cases<-function(directory, id=1:332){
  files_list<-list.files(directory,full.names=TRUE)
  dat<- plyr::ldply(files_list, read.csv)
  dat_subset<-dat[which(dat$ID %in% id), ]
  plyr::ddply(dat, .(ID), function(x) data.frame(nobs = sum(complete.cases(x))))
}

Note this will have a column called ID rather than id in the output as per the original data.请注意,根据原始数据,这将在输出中有一列名为 ID 而不是 id 。 You could use plyr::rename to change it to id if needed.如果需要,您可以使用 plyr::rename 将其更改为 id 。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 要查找数据框中完整案件的数量,并使用每个文件中此类完整案件的数量创建一个新数据框 - To find number of complete cases in a data frame and make a new data frame with the number of such complete cases from each file 如何在 R 的数据框中找出逗号在一行中出现的最大数量? - How do I find out the highest number that commas had appeared in a row in a single column in a data frame in R? 如何读取R中生成的数据框中的指定列? - How can I read the specified column in the generated data frame in R? 通过根据 R 中的分位数分配值在数据框中生成新列? - Produce new column in data frame by assigning values based on quantiles in R? 如何从 R 中的数据框中删除指定行,但根据另一个列变量消除行? - How do I remove specified rows from a data frame in R, but the rows are eliminated according to another column variable? 使用for循环在数据框中创建新列以计算R中的值? - Create new column in data frame using a for loop to calculate value in R? 在 R 数据框中,对于给定的行,如何找到 A 列中的值与 B 列中的值的百分比? - In an R data frame, for a given row, how can I find what percentage a value in column A is of a value in column B? 如何在R中的数据框中找到列的最高值? - How to find the highest value of a column in a data frame in R? 如何使用 R 中的计算在数据框中添加新列? - How to add a new column in data frame using calculation in R? 在R的数据框中为某些情况增加价值 - Add value to some cases in data frame in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM