在 for 循环中为文件列表添加一个包含文件名的新列

Question

I have a time series data.我有一个时间序列数据。 I stored the data in txt files under daily subfolders in Monthly folders.我将数据存储在每月文件夹中每日子文件夹下的 txt 文件中。

setwd(".../2018/Jan")
parent.folder <-".../2018/Jan"  
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #To read the sub-folders under parent folder
r.scripts <- file.path(sub.folders)
A_2018 <- list()
for (j in seq_along(r.scripts)) {
  A_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}

Of these.txt files, I removed some of the files which I don't want to use for the further analysis, using the following code.在这些.txt 文件中，我使用以下代码删除了一些我不想用于进一步分析的文件。

trim_to_two <- function(x) {
  runs = rle(gsub("^L1_\\d{4}_\\d{4}_","",x))
  return(cumsum(runs$lengths)[which(runs$lengths > 2)] * -1)
}

A_2018_new <- list()
for (j in seq_along(A_2018)) {
  A_2018_new[[j]] <- A_2018[[j]][trim_to_two(A_2018[[j]])]
  }

Then, I want to make a rowbind by for loop for the whole.txt files.然后，我想通过 for 循环对 whole.txt 文件进行行绑定。 Before that, I would like to remove some lines in each txt file, and add one new column with file name.在此之前，我想删除每个 txt 文件中的一些行，并添加一个包含文件名的新列。 The following is my code.以下是我的代码。

for (i in 1:length(A_2018_new)) {
  
  for (j in 1:length(A_2018_new[[i]])){
       
    filename <- paste(str_sub(A_2018_new[[i]][j], 1, 14))
        
    assign(filename, read_tsv(complete_file_name, skip = 14, col_names = FALSE), 
           )
    
    Y <- r.scripts %>% str_sub(46, 49)
    MD <- r.scripts %>% str_sub(58, 61)
    HM <- filename %>% str_sub(9, 12)
    Turn <- filename %>% str_sub(14, 14)
    time_minute <- paste(Y, MD, HM, sep="-")
    
    Map(cbind, filename, SampleID = names(filename))
    }
}

But I didn't get my desired output. I tried to code using other examples.但我没有得到我想要的 output。我尝试使用其他示例进行编码。 Could anyone help to explain what my code is missing.任何人都可以帮助解释我的代码丢失了什么。

Answer 1

Your code seems overly complex for what it is doing.您的代码对于它正在做的事情来说似乎过于复杂。 Your problem is however not 100% clear (eg what is the pattern in your file names that determine what to import and what not?).然而，您的问题并不是 100% 清楚（例如，您的文件名中的模式决定了要导入什么，什么不导入？）。 Here are some pointers that would greatly simplify the code, and likely avoid the issue you are having.以下是一些可以大大简化代码并可能避免您遇到的问题的指示。

Use lapply() or map() from the purrr package to iterate instead of a for loop.使用 purrr package 中的purrr lapply()或map()来迭代而不是 for 循环。 The benefit is that it places the different data frames in a list and you don't need to assign multiple data frames into their own objects in the environment.好处是它将不同的数据框放在一个列表中，您不需要将多个数据框分配到环境中它们自己的对象中。 Since you tagged the tidyverse, we'll use the purrr functions.由于您标记了 tidyverse，我们将使用purrr函数。

library(tidyverse)

You could for instance retrieve the txt file paths, using something like例如，您可以使用类似的方法检索 txt 文件路径

txt_files <- list.files(path = 'data/folder/', pattern = "txt$", full.names = TRUE) # Need to remove those files you don't with whatever logic applies

and then use map() with read_tsv() from readr like so:然后使用map()和read_tsv()从readr像这样：

mydata <- map(txt_files, read_tsv)

Then for your manipulation, you can again use lapply() or map() to apply that manipulation to each data frame.然后对于您的操作，您可以再次使用lapply()或map()将该操作应用于每个数据框。 The easiest way is to create a custom function, and then apply it to each data frame:最简单的方法是创建自定义 function，然后将其应用于每个数据框：

my_func <- function(df, filename) {
  df |>
    filter(...) |> # Whatever logic applies here
    mutate(filename = filename)
}

and then use map2() to apply this function, iterating through the data and filenames, and then list_rbind() to bind the data frames across the rows.然后使用map2()应用此 function，遍历数据和文件名，然后list_rbind()绑定数据帧。

mydata_output <- map2(mydata, txt_files, my_func) |>
  list_rbind()

在 for 循环中为文件列表添加一个包含文件名的新列

问题描述

1 个解决方案

解决方案1
0 2023-02-01 04:06:08

在 for 循环中为文件列表添加一个包含文件名的新列

问题描述

1 个解决方案

解决方案1 0 2023-02-01 04:06:08

解决方案1
0 2023-02-01 04:06:08