[英]How to add filename as a column to csv while reading & appending multiple csv's in r?
I am trying to read 1000's of csv file & append them and save them as one rds file.我正在尝试读取 1000 个 csv 文件和 append 并将它们保存为一个rds文件。
Issue: I am trying to add the filename as a column in each csv
so that I know what data has come from which csv file (All files have same columns) but not able to do so.问题:我正在尝试将文件名添加为每个csv
中的列,以便我知道哪些数据来自哪个 csv 文件(所有文件都有相同的列)但无法这样做。
Example csv's to work with:使用的示例 csv :
# Setting up some example csv files to work with
mtcars_slim <- select(mtcars, 1:3)
write_csv(slice(mtcars_slim, 1:4), "Input data/sub folder/AA_1.csv")
write_csv(slice(mtcars_slim, 5:10), "Input data/sub folder/AAA_2.csv")
write_csv(slice(mtcars_slim, 11:1), "Input data/sub folder/BBB_3.csv")
Code I have tried below:我在下面尝试过的代码:
# this code worked but it doesn't have filename within the dataset
list.files(path = "Input data/sub folder/",
pattern="*.csv",
full.names = T) %>%
map_df(~read_csv(.)) %>%
saveRDS("output_compiled_data.rds")
So I have tried to modify above code to include filename as column to each csv file in below code chunk but it didn't work.因此,我尝试修改上面的代码以将文件名作为列包含在下面的代码块中的每个 csv 文件中,但它不起作用。
file_names <- list.files(path = "Input data/sub folder/",
pattern="*.csv",
full.names = T) %>%
map_df(file_names, ~read_csv(.) %>%
mutate(symbol = file_names)) %>%
saveRDS("output_compiled_data.rds")
data_tbl <- read_rds("output_compiled_data.rds")
data_tbl
One option would be to use a named list of filenames.一种选择是使用文件名的命名列表。 Afterwards you could add a column with the filename via the .id
argument of map_df
:之后,您可以通过map_df
的.id
参数添加带有文件名的列:
library(dplyr)
library(purrr)
library(readr)
mtcars_slim <- select(mtcars, 1:3)
write_csv(slice(mtcars_slim, 1:4), "AA_1.csv")
write_csv(slice(mtcars_slim, 5:10), "AAA_2.csv")
write_csv(slice(mtcars_slim, 11:1), "BBB_3.csv")
fn <- list.files(
path = ".",
pattern = "\\.csv",
full.names = T
)
names(fn) <- basename(fn)
map_df(fn, ~ read_csv(., show_col_types = FALSE), .id = "file")
#> # A tibble: 21 × 4
#> file mpg cyl disp
#> <chr> <dbl> <dbl> <dbl>
#> 1 AA_1.csv 21 6 160
#> 2 AA_1.csv 21 6 160
#> 3 AA_1.csv 22.8 4 108
#> 4 AA_1.csv 21.4 6 258
#> 5 AAA_2.csv 18.7 8 360
#> 6 AAA_2.csv 18.1 6 225
#> 7 AAA_2.csv 14.3 8 360
#> 8 AAA_2.csv 24.4 4 147.
#> 9 AAA_2.csv 22.8 4 141.
#> 10 AAA_2.csv 19.2 6 168.
#> # … with 11 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.