简体   繁体   中英

Editing multiple .csv files before reading into r

I have 44 .csv files in my WD that I will eventually read into r and bind into one large file. Before I do that, I'd like to make some changes to each of the files. I want to:

  1. Change some column names in some of the files
  2. Select only the first 10 columns of each file

I've found some information on gsub for problem 1) but not enough to get me where I want to be. As for 2), it seems that this should be quite simple, but I can't find any solution online.

Many thanks!

This may work to get you the output you are looking for.

# Set path to folder
  folder.path <- getwd()

# Get list of csv files in folder
  filenames <- list.files(folder.path, pattern = "*.csv", full.names = TRUE)

# Read all csv files in the folder and create a list of dataframes
  ldf <- lapply(filenames, read.csv)

# Select the first 10 columns in each dataframe in the list
  ldf <- lapply(ldf, subset, select = 1:10)

# Create a vector for the new column names
  new.col.names <- c("col1","col2","col3","col4","col5","col6","col7","col8","col9","col10")

# Assign the new column names to each dataframe in the list
  ldf <- lapply(ldf, setNames, new.col.names)

# Combine each dataframe in the list into a single dataframe
  df.final <- do.call("rbind", ldf)

readLines is your friend. Try import each one of them as separate vector eg my_csv<-readLines("path/to/your/csv") then perform the modifications and finally save the output as follows:

out <- capture.output(my_csv)
cat(out, file="my_new.csv", sep="\n", append=F)

BUT

I would strongly recommend using data.table package and in particular the fread() function that allows fast import of csv's (as data.table objects) and then perform on them both the selection of 10 columns and the name alteration. Of course via fwrite() you can send their info back to csv at anytime.

FINALLY

and use only if the columns of every csv have the same position and name, in order to keep only the first 10 as you mentioned above

A combination of lapply and data.table can do miracles. In particular:

rbindlist(lapply(list.files("path/to/the/folder/with/csvs"),fread),use.names=TRUE, fill=FALSE) 

will solve most of your data import issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM