简体   繁体   English

在 for 循环中为文件列表添加一个包含文件名的新列

[英]Adding a new column with filenames for the list of files in a for loop

I have a time series data.我有一个时间序列数据。 I stored the data in txt files under daily subfolders in Monthly folders.我将数据存储在每月文件夹中每日子文件夹下的 txt 文件中。

setwd(".../2018/Jan")
parent.folder <-".../2018/Jan"  
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #To read the sub-folders under parent folder
r.scripts <- file.path(sub.folders)
A_2018 <- list()
for (j in seq_along(r.scripts)) {
  A_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}

Of these.txt files, I removed some of the files which I don't want to use for the further analysis, using the following code.在这些.txt 文件中,我使用以下代码删除了一些我不想用于进一步分析的文件。

trim_to_two <- function(x) {
  runs = rle(gsub("^L1_\\d{4}_\\d{4}_","",x))
  return(cumsum(runs$lengths)[which(runs$lengths > 2)] * -1)
}

A_2018_new <- list()
for (j in seq_along(A_2018)) {
  A_2018_new[[j]] <- A_2018[[j]][trim_to_two(A_2018[[j]])]
  }

Then, I want to make a rowbind by for loop for the whole.txt files.然后,我想通过 for 循环对 whole.txt 文件进行行绑定。 Before that, I would like to remove some lines in each txt file, and add one new column with file name.在此之前,我想删除每个 txt 文件中的一些行,并添加一个包含文件名的新列。 The following is my code.以下是我的代码。

for (i in 1:length(A_2018_new)) {
  
  for (j in 1:length(A_2018_new[[i]])){
       
    filename <- paste(str_sub(A_2018_new[[i]][j], 1, 14))
        
    assign(filename, read_tsv(complete_file_name, skip = 14, col_names = FALSE), 
           )
    
    Y <- r.scripts %>% str_sub(46, 49)
    MD <- r.scripts %>% str_sub(58, 61)
    HM <- filename %>% str_sub(9, 12)
    Turn <- filename %>% str_sub(14, 14)
    time_minute <- paste(Y, MD, HM, sep="-")
    
    Map(cbind, filename, SampleID = names(filename))
    }
} 

But I didn't get my desired output. I tried to code using other examples.但我没有得到我想要的 output。我尝试使用其他示例进行编码。 Could anyone help to explain what my code is missing.任何人都可以帮助解释我的代码丢失了什么。

Your code seems overly complex for what it is doing.您的代码对于它正在做的事情来说似乎过于复杂。 Your problem is however not 100% clear (eg what is the pattern in your file names that determine what to import and what not?).然而,您的问题并不是 100% 清楚(例如,您的文件名中的模式决定了要导入什么,什么不导入?)。 Here are some pointers that would greatly simplify the code, and likely avoid the issue you are having.以下是一些可以大大简化代码并可能避免您遇到的问题的指示。

Use lapply() or map() from the purrr package to iterate instead of a for loop.使用 purrr package 中的purrr lapply()map()来迭代而不是 for 循环。 The benefit is that it places the different data frames in a list and you don't need to assign multiple data frames into their own objects in the environment.好处是它将不同的数据框放在一个列表中,您不需要将多个数据框分配到环境中它们自己的对象中。 Since you tagged the tidyverse, we'll use the purrr functions.由于您标记了 tidyverse,我们将使用purrr函数。

library(tidyverse)

You could for instance retrieve the txt file paths, using something like例如,您可以使用类似的方法检索 txt 文件路径

txt_files <- list.files(path = 'data/folder/', pattern = "txt$", full.names = TRUE) # Need to remove those files you don't with whatever logic applies

and then use map() with read_tsv() from readr like so:然后使用map()read_tsv()readr像这样:

mydata <- map(txt_files, read_tsv)

Then for your manipulation, you can again use lapply() or map() to apply that manipulation to each data frame.然后对于您的操作,您可以再次使用lapply()map()将该操作应用于每个数据框。 The easiest way is to create a custom function, and then apply it to each data frame:最简单的方法是创建自定义 function,然后将其应用于每个数据框:

my_func <- function(df, filename) {
  df |>
    filter(...) |> # Whatever logic applies here
    mutate(filename = filename)
}

and then use map2() to apply this function, iterating through the data and filenames, and then list_rbind() to bind the data frames across the rows.然后使用map2()应用此 function,遍历数据和文件名,然后list_rbind()绑定数据帧。

mydata_output <- map2(mydata, txt_files, my_func) |>
  list_rbind()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM