I have a list of CSV files(A1.csv, A2.csv........D10.csv) in a folder which contains data two columns but several rows. Basically, I want to extract the values of last row and 2nd column from all the csv files See the picture to understand better
and create a data frame which will contain file name in 1st column and the extracted values(C) in the second column.
Now, I can do it by creating another list of CSV files and concatenate them later into one data frame.
Is it possible to store each data frame produced by CSV files into a list and then concatenate them (what rbind do in R). I tried this code in R, it works. But I want to learn the more efficient way in R or python.( Python is preferable as I am trying to learn python)
#read through csv files and select the last row 2nd column
m=c(NULL)
aa=c(NULL)
f=list.files(path = getwd(),pattern = '.*csv')
for (g in f){
aa=read.csv(g)
m=tail(aa,1)
q=m[,2]
yy=data.frame(ID=g,Final=q)
write.csv(yy,file = paste("Filename/",g),row.names = F)
}
###concatanate into one file
readFile=list.files(path = getwd(),pattern = "*.csv")
Alldata=lapply(readFile,function(filename){
dummy=read.csv(filename)
return(dummy)
})
FinalFIle=do.call(rbind,Alldata)
write.csv(FinalFIle,file = "FinalFIle.csv",row.names = F)
Here is an option in R.
Step 1: Prepare a vector with file names. If there are too many files in the folder, the list.files
function could be useful. Here, I just manually created it. I also assume that all the files are stored in the working directory. Otherwise, you will need to construct the file path.
file_vec <- c("A1.csv", "A2.csv", "A3.csv")
Step 2: Read all CSV file based on file_vec. The key is to use the lapply
function to apply read.csv
of every element in file_vec
.
dt_list <- lapply(file_vec, read.csv, stringsAsFactors = FALSE)
Step 3: Prepare a vector showing file names without .csv
name_vec <- sub(".csv", "", file_vec)
Step 4: Create the data frame. x[nrow(x), 2]
is a way to access the last value of the second column.
dt_final <- data.frame(File = name_vec,
Value = sapply(dt_list, function(x) x[nrow(x), 2]),
stringsAsFactors = FALSE)
dt_final
is the final output.
Here's another option using the tidyverse
in R:
library(tidyverse)
# In my example, I'm using a folder with 4 Chicago Crime Datasets
setwd("INSERT/PATH/HERE")
files <- list.files()
tibble(files) %>%
mutate(file_contents = map(files, ~ read_csv(file.path(.), n_max = 10))) %>%
unnest(file_contents) %>%
group_by(files) %>%
slice(n()) %>%
select(1:2)
Which returns:
# A tibble: 4 x 2
# Groups: filename [4]
filename X1
<chr> <int>
1 Chicago_Crimes_2001_to_2004.csv 4904
2 Chicago_Crimes_2005_to_2007.csv 10
3 Chicago_Crimes_2008_to_2011.csv 5867
4 Chicago_Crimes_2012_to_2017.csv 1891
Note that the n_max = 10
argument isn't needed. I only included this because the files I was working with are pretty large.
For anyone interested, the dataset can be found here .
Also, it's possible that you may want to avoid setting the work directory with setwd()
. If this is the case, you can use the additional argument full.names = TRUE
in list.files()
:
path <- "INSERT/PATH/HERE"
files <- list.files(path, full.names = TRUE)
I'd recommend this approach as scripts containing the line setwd()
aren't flexible, paths will change from user to user.
Python Solution
>>> import pandas as pd
>>> files = ['A1.csv', 'A2.csv', ... , 'D10.csv']
>>> df_final = pd.Dataframe({fname: pd.read_csv(fname).iat[-1, 1] for fname in files})
This is an easy case for bash
and friends. This one-liner
for i in A*.csv B*.csv C*.csv D*.csv; do awk -F , 'END{ print $NF }' "$i"; done
extracts the bottom right field, no matter how many rows or columns, of any number of files that follow the pattern you have given. If all files were in one in one folder, and they were the only .csv
files in that folder, and you wanted to save the outcome in a new file, this would do the job:
for i in *.csv; do awk -F , 'END{ print $NF }' "$i"; done > extract.txt
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.