I'm working on a research project where I need to process data from a pair of tactile gloves. After exporting the data, there are 4 rows containing date and time that I don't need when doing analysis after, and there are a lot of columns that I also don't need. Long story short, I needed to delete the first 4 rows and only keep columns [1,2,33,53,76,95,114,133,164,184,207,226,245]. I wrote a pretty simple R script to do it for me, but I'm wondering how can I apply this set of operations to all.csv files in the same directory? Manually typing each file name every time is pretty painful. Thank you in advance!
# read uncleaned, raw, data
uncleaned_data<-read.csv("C:/Users/jiang/Desktop/Ready_Clean/Hongjiao_Medium_High1.csv", header = FALSE)
# remove the date and time headers
data_without_head<-uncleaned_data[-c(1,2,3,4),]
# extract the useful columns
cleaned_data<-data_without_head[,c(1,2,33,53,76,95,114,133,164,184,207,226,245)]
# write the new cleaned data into a new file name (adding "_cleaned" in the end)
write.table(cleaned_data,"C:/Users/jiang/Desktop/Ready_Clean/Hongjiao_Medium_High1_Cleaned.csv",row.names=FALSE,col.names=FALSE,sep=",")
You can list all the files in the directory and then filter the ones ending with.csv:
I assumed that your directory path is "C:/Users/jiang/Desktop/Ready_Clean/"
unfortunately i cant test the code in my pc but let me know if you have some questions.
library(tidyverse)
library(stringr)
#get all the .csvs present in the directory and then fabricate the new names just by appending '_cleaned' before .csv
paths <- list.files(path = "C:/Users/jiang/Desktop/Ready_Clean/") %>%
str_subset(pattern = '.csv$') #capture all the files ending in .csv
paths <- str_c("C:/Users/jiang/Desktop/Ready_Clean/", paths)
paths_cleaned <- str_replace(paths, '.csv$', '_cleaned.csv')
get_csv <- function(path, path_clean){
# read uncleaned, raw, data
uncleaned_data <- read.csv(path, header = FALSE)
# remove the date and time headers
data_without_head <- uncleaned_data[-c(1,2,3,4),]
# extract the useful columns
cleaned_data <- data_without_head[, c(1,2,33,53,76,95,114,133,164,184,207,226,245)]
# write the new cleaned data into a new file name (adding "_cleaned" in the end)
write.table(cleaned_data,
path_clean,
row.names = FALSE,
col.names = FALSE,
sep = ",")
}
#walk2 would also be an option because we only care of side-effects here.
map2(path, path_cleaned, ~get_csv(.x, .y))
A Base R solution looks like this. First, we use list.files()
to extract files ending with .csv
, then use use the file list to drive lapply()
to read the data, subset it, and write with write.table()
.
theFiles <- list.files(path="C:/Users/jiang/Desktop/Ready_Clean/",
pattern="\\.csv$",full.names=TRUE)
dataList <- lapply(theFiles,function(x){
y <- read.csv(x,skip = 4,header=FALSE)[c(1,2,33,53,76,95,114,133,164,184,207,226,245)]
write.table(y,paste0(x,".cleaned"))
})
Note that we use the skip =
argument to skip the first four rows when reading each file, then immediately subset the object created by read.csv()
via the [
form of the extract operator.
in the write.table()
operation we use paste0()
to append .cleaned
to each original file name to distinguish the cleaned files from the originals.
Since the original question does not include a minimal reproducible example, we'll use the data from my Pokémon Stats GitHub repository to illustrate the solution.
The dimensionality of the Pokémon stats data is much different from the data described in the original question, so we'll skip the first four rows of each file, and retain only columns 1, 2, 4, and 6.
download.file("https://raw.githubusercontent.com/lgreski/pokemonData/master/PokemonData.zip",
"pokemonData.zip",mode="wb")
unzip("pokemonData.zip",exdir="./pokemonData")
theFiles <- list.files("./pokemonData",pattern="\\.csv$",full.names=TRUE)
dataList <- lapply(theFiles,function(x){
y <- read.csv(x,skip = 4,header=FALSE)[c(1,2,4,6)]
write.table(y,file=paste0(x,".cleaned"),row.names=FALSE,col.names=FALSE,sep=",")
})
A screenshot of one of the original files can be used to verify the output. I have highlighted columns 1, 2, 4, and 6, starting with the fourth row of input (including the header row).
...and the output for the first few rows of ./pokemonData/gen01.csv.cleaned
is:
4,"Charmander","Fire",309
5,"Charmeleon","Fire",405
6,"Charizard","Fire",534
7,"Squirtle","Water",314
8,"Wartortle","Water",405
9,"Blastoise","Water",530
The file gen01.csv
contains the first generation Pokémon. The first three Pokémon in this file are Bulbasaur, Ivysaur, and Vensuaur. We can see from the output that these Pokémon and the header row in the original file were skipped, so the first observation is Pokémon 4, Charmander. We also see that the Total
stat, the sixth column, matches the input file for the rows that have been written to the output file.
Because we appended .cleaned
at the end of each file we can use the same technique to list the .cleaned
files as we did to list the .csv
files and read them with read.csv()
. This allows us to keep the original files distanct from the cleaned files.
# now read the cleaned files
theFiles <- list.files("./pokemonData",pattern="\\.cleaned$",full.names=TRUE)
dataList <- lapply(theFiles,read.csv,header=FALSE)
head(dataList[[1]])
At this point the dataList
object is a list()
that contains 8 data frames, one for each generation of Pokémon.
We use head()
to print the first few rows of the first data frame in the list, which matches the results above:
> head(dataList[[1]])
V1 V2 V3 V4
1 4 Charmander Fire 309
2 5 Charmeleon Fire 405
3 6 Charizard Fire 534
4 7 Squirtle Water 314
5 8 Wartortle Water 405
6 9 Blastoise Water 530
Per the request made in the comments to my answer, here is a solution that creates a /cleaned
subdirectory within the directory where the files were originally stored, and writes the files to that directory.
First, we create objects for the input and output directories. Then we create a new subdirectory for the output files if it does not already exist.
# solution that creates a ./cleaned subdirectory
inputDirectory <- "./pokemonData"
outputDirectory <- paste0(inputDirectory,"/cleaned")
if(!dir.exists(outputDirectory)) dir.create(outputDirectory)
By checking whether the directory exists before attempting to create it, we eliminate errors on the second and subsequent runs of this script.
Next, we list the files in the input directory. Because we're doing to use the inputDirectory
and outputDirectory
objects later in the script to manually build the full path names for each input and output file, we set the full.names=
argument of list.files()
to FALSE
.
theFiles <- list.files(inputDirectory,pattern="\\.csv$",full.names=FALSE)
Next, we use lapply()
to read the files, subset the right rows and columns, and write the cleaned files to the output directory.
dataList <- lapply(theFiles,function(x){
y <- read.csv(paste0(inputDirectory,"/",x),skip = 4,header=FALSE)[c(1,2,4,6)]
write.table(y,file=paste0(outputDirectory,"/",x),row.names=FALSE,col.names=FALSE,sep=",")
})
# verify that files were written to cleaned directory
list.files(outputDirectory,full.names=TRUE)
...and the output:
> list.files(outputDirectory,full.names=TRUE)
[1] "./pokemonData/cleaned/gen01.csv" "./pokemonData/cleaned/gen02.csv"
[3] "./pokemonData/cleaned/gen03.csv" "./pokemonData/cleaned/gen04.csv"
[5] "./pokemonData/cleaned/gen05.csv" "./pokemonData/cleaned/gen06.csv"
[7] "./pokemonData/cleaned/gen07.csv" "./pokemonData/cleaned/gen08.csv"
>
Since commenters are asserting that the dots in the file names in paste0()
aren't being rendered correctly, the following screenshot of the subdirectory demonstrates that the code does indeed work as I intended.
Hi I did some coding for you to answer your question.
Below is the code:
setwd("C:/Users/jiang/Desktop/Ready_Clean")
list_of_file_names <- list.files(pattern = "*png")
for(i in list_of_file_names){
# read uncleaned, raw, data
print(i)
uncleaned_data<-read.csv( i , header = FALSE)
# remove the date and time headers
data_without_head<-uncleaned_data[-c(1,2,3,4),]
# extract the useful columns
cleaned_data<-data_without_head[,c(1,2,33,53,76,95,114,133,164,184,207,226,245)]
# write the new cleaned data into a new file name (adding "_cleaned" in the end)
write.table(cleaned_data,paste(i,"_Cleaned.csv"),row.names=FALSE,col.names=FALSE,sep=",")
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.