I have a folder with 100 different .csv files. Not all files are containing the same number of variables (different structure) so I am trying to import them all at once and ( create separate data frame for each csv) then standardize dataframes by adding a new column or convert date column from character to date and export them at once again in the end. here is my try, it will work an read all the csv as a separate data frame
setwd(C:/Users/...)
files <- list.files(pattern="*.csv")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep="")))
}
However, when I adding mutate
to assign
function to add a new column for instance , script will run but will not add any column! What I am missing here? My aim is add/manipulate some variables and export them again , preferably within tidyverse
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read_csv(paste(path,file,sep="")),
mutate(. , Heading = "Data"))
}
df1 <- structure(list(datadate = structure(c(17927, 17927, 17927, 17927,
17927, 17927), class = "Date"), parent = c("grup", "grup",
"grup", "grup", "grup", "grup"), ads = c("P9",
"PS8", "PS7", "PS6", "PS5", "PS5"), chl = c("PSS9",
"PSS8", "PSS7", "PSS6", "PSS5", "PSS5"),
average_monthly = c(196586.49, 289829.43,
1363529.14, 380446.43, 147296.09, 948669.38), current_month = c(987118.82,
1682872.03, 4356755.73, 2225040.29, 922506.21, 5756525.08
), current_month_minus_1 = c(585573.1,
635763.37, 6551477.37, 818531.11, 255862.51, 1832829.99),
current_month_minus_2 = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
df2<-
structure(
list(
network = c("STAR", "NPD", "GMD"),
datadate = structure(c(18259, 18259, 18259)),
brand = c("grup", "GFK", "MDG"),
average_weekly = c(140389.14,
10281188.25, 172017.39),
last_week_avg = c(89303.07,
6918460.99, 110594.64),
last_week_1_minus_avg = c(141765.83,
10248501.1, 222484.9),
last_week_2_minus_avg = c(138043.53,
9846538.57, 164185.21)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L)
)
Base R solution to read the files into a list, the changes required to merge them depend on your data:
# Store a scalar of the path containing the csvs:
example_dir <- "C:/Users/Example_Dir"
# Create a vector of the csv paths:
files <- file.path(example_dir, list.files(example_dir, pattern = ".*.csv"))
# Create an empty list the same length as the number of files:
X <- vector("list", length(files))
# Iterate through the files and store them in a list:
X[] <- lapply(seq_along(files), function(i){
data.frame(read.csv(files[i]), stringsAsFactors = FALSE)
}
)
Aside from the design of your code, it seems that you are using mutate
the wrong way.
In your code, you are placing the mutate
call as the 3rd argument of the assign
function, which should be the position (the environment of your variable).
What you'd really want to write is this:
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read_csv(paste(path,file,sep="")) %>%
mutate(Heading = "Data"))
}
If you are not familiar with the pipe operator ( %>%
), I suggest that you read some tutorials like the dplyr
vignette which has a paragraph about it.
This code means: assign to a variable named after the gsub
call the dataframe read from the csv, after mutating it to add the Heading
column.
But, as in hello_friend
's answer, I urge you to rethink your design to work with lists rather than a bunch of variables. For this, the tidyverse
way is to use the purrr
package
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.