简体   繁体   English

读取并处理R中的多个文件

[英]Read and process multiple files in R

I have more than 2000 txt files (each having 5 columns), each being more than 30 mb big.我有 2000 多个 txt 文件(每个有 5 列),每个文件都超过 30 mb 大。 I would like to process through all files by read each file in and then process each of them separately and get an output and then process the next.我想通过读取每个文件来处理所有文件,然后分别处理每个文件并获得 output 然后处理下一个。 SO I can't read them in at once and process them, since these files are too big.所以我不能一次读入并处理它们,因为这些文件太大了。 But something in my code is not working correctly.但是我的代码中的某些内容无法正常工作。

        files = list.files(path = "Path/to/my/Directory/", 
                   pattern = "*.txt", 
                   full.names = TRUE)


FUN = function(files) {

CSA_input_data<-fread(files)

#1
CSA_input_data[,'x21_CT'] = ((CSA_input_data[,'CASE_ALLELE_CT']/2) - CSA_input_data[,'A1_CASE_CT'])
#2
CSA_input_data[,'x21'] = CSA_input_data[,'x21_CT']/CSA_input_data[,'CASE_ALLELE_CT']

#x22
#1
CSA_input_data[,'x22_CT'] = ((CSA_input_data[,'CTRL_ALLELE_CT']/2) - CSA_input_data[,'A1_CTRL_CT'])
#2
CSA_input_data[,'x22'] = CSA_input_data[,'x22_CT']/CSA_input_data[,'CTRL_ALLELE_CT']



write.table(CSA_input_data, "Path/to/my/Directory/", sep="\t", quote=FALSE, row.names=FALSE, col.names=TRUE)

}

for (i in 1:length(files)) {
  FUN(files[i])
}

I get the error:我得到错误:

Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection 

You are passing only directory name to write data.您只传递目录名称来写入数据。 Change the function to将 function 更改为

files = list.files(path = "Path/to/my/Directory/", 
                   pattern = "*.txt", 
                   full.names = TRUE)


FUN = function(files) {
  CSA_input_data <- data.table::fread(files)
  
  #1
  CSA_input_data[,'x21_CT'] = ((CSA_input_data[,'CASE_ALLELE_CT']/2) - CSA_input_data[,'A1_CASE_CT'])
  #2
  CSA_input_data[,'x21'] = CSA_input_data[,'x21_CT']/CSA_input_data[,'CASE_ALLELE_CT']
  
  #x22
  #1
  CSA_input_data[,'x22_CT'] = ((CSA_input_data[,'CTRL_ALLELE_CT']/2) - CSA_input_data[,'A1_CTRL_CT'])
  #2
  CSA_input_data[,'x22'] = CSA_input_data[,'x22_CT']/CSA_input_data[,'CTRL_ALLELE_CT']
  
 write.table(CSA_input_data, paste0("Path/to/my/Directory/result_", basename(files)), sep="\t", quote=FALSE, row.names=FALSE, col.names=TRUE)
  
}

and then use lapply or for loop.然后使用lapplyfor循环。

lapply(files, FUN)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM