简体   繁体   中英

How to combine multiple .csv files, and add a column with each dataset's name, in R?

I'm trying to combine multiple CSV files in R so that I can do some predictive modeling. While each file has the same columns and the same order of those columns is the same, the names are different for some columns. So far, my code combines the files just fine and strips away the headers. What I now need it to do now, however, is add another two columns for the date associated with each CSV. The file name of each CSV contains the date.

The file names are formatted as follows: 'January 2017', 'February 2017', 'March 2017', etcetera.

So I want the two columns to be the month and year.

Below is the code I've used so far. It combines all the CSV's into one, but doesn't create the two additional columns which I need.

dat <- setwd('C:/Users/ . . . /Historical Data')

file_names <- dir(dat)
dataset <- do.call(rbind, lapply(file_names, read.csv, skip = 1, header = FALSE))
dataset <- do.call(rbind, lapply(file_names, read.csv, header = FALSE, function(x) cbind(read.csv(x), name=strsplit(x,'\\.')[[1]][1])))

head(dataset)

Can anyone point me in the right direction for how to best code these two columns into this?

Your code was pretty good to begin with.

The following code reads each element in file_list and appends it to an empty list. It then binds all the elements together. It is good for batch reading files and keeping their file names in a separate column.

Try doing this:

library(data.table)
file_list <- list()
file_list <- lapply(file_names, function(x){
                    ret <- read_csv(x)
                    ret$origin <- x
                    return(ret)})
df <- rbindlist(file_list)

Here is a library(tidyverse) way of accomplishing what you need, you can still set your working directory to where it needs to be and instead of using dir() you can use list.files()

dat_files <- list.files(".../Historical Data", pattern='*.csv')

map_df(dat_files, ~read_csv(.x) %>%
                 mutate(month_year = str_remove_all(.x, ".csv", "")) %>%
                 separate(month_year, into=c("Month", "Year"), sep=" ")
)

This code will read all your files into one df and use the file name to create a new column without .csv attached to it. It will then separate the that column into the Month and Year column be separating on the " "

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM