简体   繁体   English

R中带有if条件的txt.files列表的循环循环

[英]For-loop over list of txt.files with if conditions in R

I am struggling with creating a for loop over all txt.files in a specific repository. 我正在努力为特定存储库中的所有txt.files创建for循环。 The goal is to merge all separately saved txt.files in a dataframe and add an ID-variable that can always be found in the txt-file-names (eg, ID=10 for the file "10_1. Recording 01.10.2015 131514_CsvData.txt" ) 目标是将所有单独保存的txt.files合并到一个数据帧中,并添加一个ID变量,该变量始终可以在txt文件名中找到(例如,文件“ 10_1。ID = 10。记录01.10.2015 131514_CsvData”。文本” )

txt_files <- list.files("Data/study", pattern = ".txt")  

txt_files [1] "1_1. Recording 18.09.2015 091037_CsvData.txt" "10_1. Recording 01.10.2015 131514_CsvData.txt" txt_files [1]“ 1_1。记录18.09.2015 091037_CsvData.txt”“ 10_1。记录01.10.2015 131514_CsvData.txt”
[3] "100_1. Recording 02.10.2015 091630_CsvData.txt" "104_1. Recording 22.09.2015 142604_CsvData.txt" [3]“ 100_1。记录02.10.2015 091630_CsvData.txt”“ 104_1。记录22.09.2015 142604_CsvData.txt”
[5] "107_1. Recording 18.09.2015 104300_CsvData.txt" "110_1. Recording 29.09.2015 081558_CsvData.txt" [5]“ 107_1。记录18.09.2015 104300_CsvData.txt”“ 110_1。记录29.09.2015 081558_CsvData.txt”
[7] "112_1. Recording 21.09.2015 082908_CsvData.txt" "114_1. Recording 29.09.2015 101159_CsvData.txt" [7]“ 112_1。记录21.09.2015 082908_CsvData.txt”“ 114_1。记录器29.09.2015 101159_CsvData.txt”
[9] "115_1. Recording 23.09.2015 141204_CsvData.txt" "116_1. Recording 30.09.2015 110624_CsvData.txt" [9]“ 115_1。记录23.09.2015 141204_CsvData.txt”“ 116_1。记录30.09.2015 110624_CsvData.txt”
[11] "117_1. Recording 01.10.2015 141227_CsvData.txt" "120_1. Recording 17.09.2015 153516_CsvData.txt" [11]“ 117_1。记录01.10.2015 141227_CsvData.txt”“ 120_1。记录17.2015.17 153516_CsvData.txt”

Read in and merge txt.files 读入并合并txt.files

    for ( file in txt_files){
    #  if the merged dataframe "final_df" doesn't already exist, create it
    if (!exists("final_df")){
    final_df<- read.table(paste("Data/study/",file, sep=""), header=TRUE, fill=TRUE)
    temp_ID <- substring(file, 0,str_locate_all(pattern ='_1.',file)[[1]][1]-1)
    final_df$ID <- temp_ID
    final_df <- as.data.frame(final_df)
  }
  #  if the merged dataframe does already exist, append to it
  else {
    temp_dataset <- read.table(paste("Data/study/",file, sep=""), header=TRUE, fill=TRUE)
    #   extract ID column from filename
    temp_ID <- substring(file, 0,str_locate_all(pattern ='_1.',file)[[1]][1]-1)
    temp_dataset$ID <- temp_ID
    final_df<-rbind(final_df, temp_dataset)
  }
  return(as.data.frame(final_df))
}

Avoid using rbind in a loop which leads to excessive copying in memory. 避免在循环中使用rbind ,这会导致过多的内存复制。 Consider building a list of data frames and bind them together once with do.call outside of any loop. 考虑构建一个数据帧列表,并在任何循环外使用do.call将它们绑定一次。 For this approach, lapply is a useful iterative alternative than for to build such a list of data frames as you avoid the bookkeeping of initializing an empty list and iteratively updating elements. 对于这种方法, lapply是一种有用的迭代替代方法, lapply不是for构建这样的数据帧列表,因为您lapply记账初始化空列表和迭代更新元素。

Also consider paste0 with no separator argument and gsub to remove any characters from underscore to end of string for to extract ID. 还要考虑paste0带分隔符参数的paste0gsub来删除下划线到字符串末尾的任何字符以提取ID。

setwd("Data/study")
txt_files <- list.files(pattern = ".txt")  

df_list <- lapply(txt_files, function(file)  
                  transform(read.table(file, header=TRUE, fill=TRUE),
                            temp_ID = gsub("_.*", "", file))   
           )

final_df <- do.call(rbind, df_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM