简体   繁体   English

如何从一个文件夹中的多个csv文件创建一个数据框

[英]How to create one dataframe from multiple csv files in a folder

I have a list of CSV files(A1.csv, A2.csv........D10.csv) in a folder which contains data two columns but several rows. 我在一个文件夹中有一个CSV文件列表(A1.csv,A2.csv ........ D10.csv),该文件夹包含两列但几行的数据。 Basically, I want to extract the values of last row and 2nd column from all the csv files See the picture to understand better 基本上,我想从所有CSV文件中提取最后一排和第二列的值, 看到的图片更好地理解

and create a data frame which will contain file name in 1st column and the extracted values(C) in the second column. 并创建一个数据框,该数据框将在第一列包含文件名,在第二列包含提取的值(C)。

Now, I can do it by creating another list of CSV files and concatenate them later into one data frame. 现在,我可以通过创建另一个CSV文件列表并将它们以后连接到一个数据帧中来实现。

Is it possible to store each data frame produced by CSV files into a list and then concatenate them (what rbind do in R). 是否可以将CSV文件产生的每个数据帧存储到列表中,然后将它们连接起来(rbind在R中做什么)。 I tried this code in R, it works. 我在R中尝试了此代码,它可以工作。 But I want to learn the more efficient way in R or python.( Python is preferable as I am trying to learn python) 但是我想学习使用R或python的更有效的方法。(Python是更可取的,因为我正在尝试学习python)

#read through csv files and select the last row 2nd column
m=c(NULL)
aa=c(NULL)
f=list.files(path = getwd(),pattern = '.*csv')
for (g in f){
aa=read.csv(g)
m=tail(aa,1)
q=m[,2]
yy=data.frame(ID=g,Final=q)
write.csv(yy,file = paste("Filename/",g),row.names = F)
}
###concatanate into one file
readFile=list.files(path = getwd(),pattern = "*.csv")
Alldata=lapply(readFile,function(filename){
dummy=read.csv(filename)
return(dummy)
})
FinalFIle=do.call(rbind,Alldata)
write.csv(FinalFIle,file = "FinalFIle.csv",row.names = F)

Here is an option in R. 这是R中的一个选项。

Step 1: Prepare a vector with file names. 步骤1:准备一个带有文件名的向量。 If there are too many files in the folder, the list.files function could be useful. 如果文件夹中的文件太多,则list.files函数可能会很有用。 Here, I just manually created it. 在这里,我只是手动创建的。 I also assume that all the files are stored in the working directory. 我还假定所有文件都存储在工作目录中。 Otherwise, you will need to construct the file path. 否则,您将需要构造文件路径。

file_vec <- c("A1.csv", "A2.csv", "A3.csv")

Step 2: Read all CSV file based on file_vec. 第2步:读取基于file_vec的所有CSV文件。 The key is to use the lapply function to apply read.csv of every element in file_vec . 关键是使用lapply函数来应用read.csv中每个元素的file_vec

dt_list <- lapply(file_vec, read.csv, stringsAsFactors = FALSE)

Step 3: Prepare a vector showing file names without .csv 步骤3:准备一个显示不带.csv文件名的向量

name_vec <- sub(".csv", "", file_vec)

Step 4: Create the data frame. 步骤4:创建数据框。 x[nrow(x), 2] is a way to access the last value of the second column. x[nrow(x), 2]是访问第二列的最后一个值的方法。

dt_final <- data.frame(File = name_vec,
                       Value = sapply(dt_list, function(x) x[nrow(x), 2]),
                       stringsAsFactors = FALSE)

dt_final is the final output. dt_final是最终输出。

Here's another option using the tidyverse in R: 这是在R中使用tidyverse的另一个选项:

library(tidyverse)

# In my example, I'm using a folder with 4 Chicago Crime Datasets
setwd("INSERT/PATH/HERE")

files <- list.files()

tibble(files) %>%
  mutate(file_contents = map(files, ~ read_csv(file.path(.), n_max = 10))) %>% 
  unnest(file_contents) %>%
  group_by(files) %>%
  slice(n()) %>% 
  select(1:2)

Which returns: 哪个返回:

# A tibble: 4 x 2
# Groups:   filename [4]
                         filename    X1
                            <chr> <int>
1 Chicago_Crimes_2001_to_2004.csv  4904
2 Chicago_Crimes_2005_to_2007.csv    10
3 Chicago_Crimes_2008_to_2011.csv  5867
4 Chicago_Crimes_2012_to_2017.csv  1891

Note that the n_max = 10 argument isn't needed. 请注意, n_max = 10参数。 I only included this because the files I was working with are pretty large. 我之所以只包括它,是因为我使用的文件很大。

For anyone interested, the dataset can be found here . 对于任何感兴趣的人,都可以在此处找到数据集。

Also, it's possible that you may want to avoid setting the work directory with setwd() . 另外,您可能希望避免使用setwd()设置工作目录。 If this is the case, you can use the additional argument full.names = TRUE in list.files() : 在这种情况下,可以在list.files()使用附加参数full.names = TRUE

path <- "INSERT/PATH/HERE"
files <- list.files(path, full.names = TRUE)

I'd recommend this approach as scripts containing the line setwd() aren't flexible, paths will change from user to user. 我建议采用这种方法,因为包含setwd()行的脚本不灵活,路径会因用户而setwd()

Python Solution Python解决方案

>>> import pandas as pd
>>> files = ['A1.csv', 'A2.csv', ... , 'D10.csv']
>>> df_final = pd.Dataframe({fname: pd.read_csv(fname).iat[-1, 1] for fname in files})

This is an easy case for bash and friends. 对于bash和朋友来说,这是一个简单的案例。 This one-liner 这个单线

for i in A*.csv B*.csv C*.csv D*.csv; do awk -F , 'END{ print $NF }' "$i"; done

extracts the bottom right field, no matter how many rows or columns, of any number of files that follow the pattern you have given. 无论遵循多少行或多少列,都将提取遵循您提供的模式的任意数量的文件的右下角字段。 If all files were in one in one folder, and they were the only .csv files in that folder, and you wanted to save the outcome in a new file, this would do the job: 如果所有文件都在一个文件夹中,并且它们是该文件夹中唯一的.csv文件,并且您想将结果保存在一个新文件中,则可以完成以下工作:

for i in *.csv; do awk -F , 'END{ print $NF }' "$i"; done > extract.txt

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从多个 csv 文件创建数据框? - How to create a dataframe from multiple csv files? 将文件夹的多个 csv 文件加载到一个数据框中 - Loading multiple csv files of a folder into one dataframe 如何从指定目录中的多个 csv 文件创建单个 dataframe - How to create a single dataframe from multiple csv files in a specified directory 如何从一个文件夹中一个一个地打开多个 CSV 文件,一个管理员想要 - How to open multiple CSV files from a folder one by one, the one admin wants 如何将多个CSV文件从文件夹读取到以数据框名称作为文件名的熊猫中 - how to read multiple CSV files from folder into pandas with dataframe name as file name 从Spark中具有不同标头的多个csv文件创建一个数据帧 - Create one dataframe from multi csv files with different headers in Spark 从完整的 txt 文件夹创建多个或单个 csv 文件 - create multiple or single csv file from complete txt files folder 如何从包含 python 中的多个 csv 文件的文件夹中一次读取一个文件 - How to read one file at a time from folder that contains multiple csv files in python 如何使用 pandas 导入多个 csv 文件并连接成一个 DataFrame - How to import multiple csv files and concatenate into one DataFrame using pandas 如何通过从多个内容相似的 csv 文件中导入数据来创建 dataframe? - How to create a dataframe by importing data from multiple .csv files that are alike in contents?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM