简体   繁体   English

如何在 r 中读取和附加多个 csv 时将文件名作为列添加到 csv?

[英]How to add filename as a column to csv while reading & appending multiple csv's in r?

I am trying to read 1000's of csv file & append them and save them as one rds file.我正在尝试读取 1000 个 csv 文件和 append 并将它们保存为一个rds文件。

Issue: I am trying to add the filename as a column in each csv so that I know what data has come from which csv file (All files have same columns) but not able to do so.问题:我正在尝试将文件名添加为每个csv中的列,以便我知道哪些数据来自哪个 csv 文件(所有文件都有相同的列)但无法这样做。

Example csv's to work with:使用的示例 csv

# Setting up some example csv files to work with

mtcars_slim <- select(mtcars, 1:3)

write_csv(slice(mtcars_slim, 1:4), "Input data/sub folder/AA_1.csv")
write_csv(slice(mtcars_slim, 5:10), "Input data/sub folder/AAA_2.csv")
write_csv(slice(mtcars_slim, 11:1), "Input data/sub folder/BBB_3.csv")

Code I have tried below:我在下面尝试过的代码:

# this code worked but it doesn't have filename within the dataset

list.files(path = "Input data/sub folder/",
              pattern="*.csv", 
              full.names = T)  %>% 
    map_df(~read_csv(.)) %>% 
  
  saveRDS("output_compiled_data.rds")

So I have tried to modify above code to include filename as column to each csv file in below code chunk but it didn't work.因此,我尝试修改上面的代码以将文件名作为列包含在下面的代码块中的每个 csv 文件中,但它不起作用。

file_names <- list.files(path = "Input data/sub folder/",
              pattern="*.csv", 
              full.names = T)  %>% 
    map_df(file_names, ~read_csv(.) %>% 
                    mutate(symbol = file_names)) %>% 
  
  saveRDS("output_compiled_data.rds")


data_tbl <- read_rds("output_compiled_data.rds")
data_tbl

One option would be to use a named list of filenames.一种选择是使用文件名的命名列表。 Afterwards you could add a column with the filename via the .id argument of map_df :之后,您可以通过map_df.id参数添加带有文件名的列:

library(dplyr)
library(purrr)
library(readr)

mtcars_slim <- select(mtcars, 1:3)

write_csv(slice(mtcars_slim, 1:4), "AA_1.csv")
write_csv(slice(mtcars_slim, 5:10), "AAA_2.csv")
write_csv(slice(mtcars_slim, 11:1), "BBB_3.csv")

fn <- list.files(
  path = ".",
  pattern = "\\.csv",
  full.names = T
)
names(fn) <- basename(fn)

map_df(fn, ~ read_csv(., show_col_types = FALSE), .id = "file")
#> # A tibble: 21 × 4
#>    file        mpg   cyl  disp
#>    <chr>     <dbl> <dbl> <dbl>
#>  1 AA_1.csv   21       6  160 
#>  2 AA_1.csv   21       6  160 
#>  3 AA_1.csv   22.8     4  108 
#>  4 AA_1.csv   21.4     6  258 
#>  5 AAA_2.csv  18.7     8  360 
#>  6 AAA_2.csv  18.1     6  225 
#>  7 AAA_2.csv  14.3     8  360 
#>  8 AAA_2.csv  24.4     4  147.
#>  9 AAA_2.csv  22.8     4  141.
#> 10 AAA_2.csv  19.2     6  168.
#> # … with 11 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM