简体   繁体   中英

import a folder with multiple .csv files and manipulate all dataframe at once in R

I have a folder with 100 different .csv files. Not all files are containing the same number of variables (different structure) so I am trying to import them all at once and ( create separate data frame for each csv) then standardize dataframes by adding a new column or convert date column from character to date and export them at once again in the end. here is my try, it will work an read all the csv as a separate data frame

setwd(C:/Users/...)
files <- list.files(pattern="*.csv")
for(file in files)
{
  perpos <- which(strsplit(file, "")[[1]]==".")
  assign(
    gsub(" ","",substr(file, 1, perpos-1)), 
    read.csv(paste(path,file,sep="")))
} 

However, when I adding mutate to assign function to add a new column for instance , script will run but will not add any column! What I am missing here? My aim is add/manipulate some variables and export them again , preferably within tidyverse

for(file in files)
{
  perpos <- which(strsplit(file, "")[[1]]==".")
  assign(
    gsub(" ","",substr(file, 1, perpos-1)), 
    read_csv(paste(path,file,sep="")),
    mutate(. , Heading = "Data"))
} 

Example

df1 <- structure(list(datadate = structure(c(17927, 17927, 17927, 17927, 
17927, 17927), class = "Date"), parent = c("grup", "grup", 
"grup", "grup", "grup", "grup"), ads = c("P9", 
"PS8", "PS7", "PS6", "PS5", "PS5"), chl = c("PSS9", 
"PSS8", "PSS7", "PSS6", "PSS5", "PSS5"), 
    average_monthly = c(196586.49, 289829.43, 
    1363529.14, 380446.43, 147296.09, 948669.38), current_month = c(987118.82, 
    1682872.03, 4356755.73, 2225040.29, 922506.21, 5756525.08
    ), current_month_minus_1 = c(585573.1, 
    635763.37, 6551477.37, 818531.11, 255862.51, 1832829.99), 
    current_month_minus_2 = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

df2<-
  structure(
    list(
      network = c("STAR", "NPD", "GMD"),
      datadate = structure(c(18259, 18259, 18259)),
      brand = c("grup", "GFK", "MDG"),
      average_weekly = c(140389.14,
                                           10281188.25, 172017.39),
      last_week_avg = c(89303.07,
                                         6918460.99, 110594.64),
      last_week_1_minus_avg = c(141765.83,
                                                 10248501.1, 222484.9),
      last_week_2_minus_avg = c(138043.53,
                                                 9846538.57, 164185.21)

    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA, -3L)
  )

Base R solution to read the files into a list, the changes required to merge them depend on your data:

# Store a scalar of the path containing the csvs: 

example_dir <- "C:/Users/Example_Dir"

# Create a vector of the csv paths: 

files <- file.path(example_dir, list.files(example_dir, pattern = ".*.csv"))

# Create an empty list the same length as the number of files: 

X <- vector("list", length(files))

# Iterate through the files and store them in a list:

X[] <- lapply(seq_along(files), function(i){

    data.frame(read.csv(files[i]), stringsAsFactors = FALSE)

  }
)

Aside from the design of your code, it seems that you are using mutate the wrong way.

In your code, you are placing the mutate call as the 3rd argument of the assign function, which should be the position (the environment of your variable).

What you'd really want to write is this:

assign(
  gsub(" ","",substr(file, 1, perpos-1)), 
  read_csv(paste(path,file,sep="")) %>% 
    mutate(Heading = "Data"))
} 

If you are not familiar with the pipe operator ( %>% ), I suggest that you read some tutorials like the dplyr vignette which has a paragraph about it.

This code means: assign to a variable named after the gsub call the dataframe read from the csv, after mutating it to add the Heading column.

But, as in hello_friend 's answer, I urge you to rethink your design to work with lists rather than a bunch of variables. For this, the tidyverse way is to use the purrr package

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM