[英]Apply gather function to multiple csv files, converting each input to an output file with a corresponding name in R
[英]in R: loop over multiple input files and save the results file according the name of each input file
我是 R 的初学者。 我需要您的帮助来自动对列表“分数”(约 50 个文件)中包含的所有文件进行这些分析,并获取(并保存)摘要 output,其结果对应于每个输入文件(要命名的摘要 output类似于每个输入文件)。
现在,使用我的代码(附加),我可以一次获得一个输入文件的结果。 例如,如果我考虑名为“scores”的列表的第二个元素:
file.names <- dir(pattern ="*.my.files") ### set the pattern name of the input files
scores<-list() ### create a variable (list) that will contain the input data
for (k in 1:length(file.names)){scores[[k]] <- read.table(file.names[k],h=T)}
### scores[[2]] is the second element of the list named "scores"
f1<-function(threshold) { glm(pheno ~ threshold + PC1 + PC2 + PC3 + PC4 +PC5 +PC6 +PC7+ PC8 + PC9 + C1+C2+C3+C4+C5+C6+C7+C8+C9, family='binomial',data=scores[[2]]) }
m1<-apply(scores[[2]][2:9],2,f1) ### scores[[2]] is the second element of the list named "scores",
### the columns from 2 to 9 corresponds to the scores at the different p thresholds
### after the first comma --> MARGIN: a vector giving the subscripts which the function
### will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns,
### c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector
### selecting dimension names.
### calculate OR and R2:
out1<-do.call(rbind,lapply(m1,function(z)summary(z)$coefficients[2,]))
out1<-data.frame(out1)
out1$OR<-exp(out1$Estimate)
out1$ci_l<-exp(out1$Estimate-(1.96 * out1$Std..Error))
out1$ci_u<-exp(out1$Estimate+(1.96 * out1$Std..Error))
write.table(out1, file="estimates.txt", quote=F, sep=" ", dec=".", na="NA", row.names=T, col.names=T)
### calculate nagelkerkeR2:
library(sizeMat)
full.nagel.r2<-do.call(rbind,lapply(m1,function(z)nagelkerkeR2(z)))
full.nagel.r2<-data.frame(full.nagel.r2)
n1<-glm(as.factor(pheno) ~ PC1 + PC2 + PC3 + PC4 +PC5 +PC6 +PC7+ PC8 + PC9 + C1+C2+C3+C4+C5+C6+C7+C8+C9, family='binomial',data=scores[[2]]) # null model
full.nagel.r2$prs.r2<-full.nagel.r2[,1] - nagelkerkeR2(n1)
write.table(full.nagel.r2, file="variance.txt", quote=F, sep=" ", dec=".", na="NA", row.names=T, col.names=T)
我还想知道是否可以合并“estimate.txt”和“variance.txt”文件的结果,以便为名为“scores”的列表中的每个输入文件获取一个 output 文件。
提前谢谢了。
只需使用数据框输入概括您的流程,或构建数据框的命名列表并迭代不同.txt
文件的名称。 下面显示了lapply
如何处理您的迭代和数据处理需求:
library(sizeMat)
...
# BUILD NAMED LIST OF DATA FRAMES
file.names <- dir(pattern ="*.my.files")
scores <- setNames(lapply(file.names, read.table, header=TRUE),
gsub(".my.files", "", file.names))
# DEFINED FUNCTION RECEIVING DF NAME PARAM
proc_scores <- function(df_name) {
df <- scores[[df_name]]
models <- lapply(df, function(threshold) {
glm(pheno ~ threshold + PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC8+PC9 + C1+C2+C3+C4+C5+C6+C7+C8+C9,
family='binomial', data=df)
})
### calculate OR and R2:
coeff_mat <- do.call(rbind, lapply(models, function(z) summary(z)$coefficients[2,]))
coeff_df <- transform(data.frame(coeff_mat),
OR = exp(Estimate),
ci_l = exp(Estimate - (1.96 * Std..Error)),
ci_u = exp(Estimate + (1.96 * Std..Error)),
score = df_name
)
# SAVE OUTPUT WITH DATA FRAME NAME
write.table(coeff_df, file=paste0("estimates_", df_name, "_.txt"), quote=FALSE,
sep=" ", dec=".", na="NA", row.names=TRUE, col.names=TRUE)
### calculate nagelkerkeR2
null.model <- glm(as.factor(pheno) ~ PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC8+PC9 + C1+C2+C3+C4+C5+C6+C7+C8+C9,
family='binomial', data=df)
nagel_mat <- do.call(rbind, lapply(models, function(z) nagelkerkeR2(z)))
nagel_df <- data.frame(nagel_mat)
nagel_df$prs.r2 <- nagel_df[,1] - nagelkerkeR2(null.model)
nagel_df$score <- df_name
# SAVE OUTPUT WITH DATA FRAME NAME
write.table(nagel_df, file=paste0("variance_", df_name,"_.txt"), quote=FALSE,
sep=" ", dec=".", na="NA", row.names=TRUE, col.names=TRUE)
# RETURN LIST OF DATA FRAMES
return(list(coeff=coeff_df, nagel=nagel_df))
}
# ITERATELY CALL FUNCTION
results <- lapply(names(scores), proc_scores)
由于进程将数据导出到.txt
并在列表中返回相同的对象,因此您可以相应地访问它们:
results[[1]]$coeff
results[[2]]$coeff
results[[3]]$coeff
...
results[[1]]$nagel
results[[2]]$nagel
results[[3]]$nagel
...
对于结果的主编译:
master_coeff <- do.call(rbind, lapply(results, "[[", "coeff"))
# SAVE OUTPUT
write.table(master_coeff, file="estimates.txt", quote=FALSE, sep=" ",
dec=".", na="NA", row.names=TRUE, col.names=TRUE)
master_nagel <- do.call(rbind, lapply(results, "[[", "nagel"))
# SAVE OUTPUT
write.table(master_nagel, file="variance.txt", quote=FALSE,
sep=" ", dec=".", na="NA", row.names=TRUE, col.names=TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.