简体   繁体   English

将共享 2 列的多个 CSV 文件合并到一个唯一的数据框中

[英]Merge multiple CSV files that share 2 columns into one unique data frame

I have multiple CSV files (like 200) in a folder that I want to merge them into one unique dataframe.我在一个文件夹中有多个 CSV 文件(如 200),我想将它们合并到一个唯一的数据帧中。 For example, each file has 3 columns, of which 2 are common in all the files ( Country and Year ), the third column is different in each file.例如,每个文件有 3 列,其中 2 列在所有文件( CountryYear )中是通用的,每个文件中的第三列是不同的。

For example, one file has the following columns:例如,一个文件具有以下列:

Country  Year    X 
----------------------
Mexico   2015    10
Spain    2014    6

And other file can be like this:其他文件可以是这样的:

Country  Year    A
--------------------
Mexico   2015    90
Spain    2014    67
USA      2020    8

I can read this files and merge them with the following code:我可以读取这些文件并将它们与以下代码合并:

x = pd.read_csv("x.csv")
a = pd.read_csv("a.csv")
df = pd.merge(a, x, how="left", left_on=["country", "year"], 
                right_on=["country", "year"], indicator=False)

And this result in the output that I want, like this:这导致了我想要的输出,如下所示:

Country  Year    A    X
-------------------------
Mexico   2015    90   10
Spain    2014    67   6
USA      2020    8

However, my problem is to do the previously process with each file, there are more than 200, I want to know if I can use a loop (or other method) in order to read the files and merge them into a unique dataframe.但是,我的问题是对每个文件进行先前的处理,有 200 多个,我想知道是否可以使用循环(或其他方法)来读取文件并将它们合并到一个唯一的数据帧中。

Thank you very much, I hope I was clear enough.非常感谢,希望我说的够清楚了。

Use glob like this:像这样使用 glob:

import glob
print(glob.glob("/home/folder/*.csv"))

This gives all your files in a list : ['/home/folder/file1.csv', '/home/folder/file2.csv', .... ]这将列出列表中的所有文件: ['/home/folder/file1.csv', '/home/folder/file2.csv', .... ]

Now, you can just iterate over this list : from 1->end, keeping 0 as your base , and do pd.read_csv() and pd.merge() - it should be sorted!现在,你可以遍历这个列表: from 1->end, keeping 0 as your base ,然后做pd.read_csv()pd.merge() - 它应该被排序!

Try this:尝试这个:

import os
import pandas as pd

# update this to path that contains your .csv's
path = '.' 

# get files that end with csv in path
dir_list = [file for file in os.listdir(path) if file.endswith('.csv')]

# initiate empty list
df_list = []
# simple for loop with Try, Except that passes on iterations that throw errors when trying to 'read_csv' your files
for file in dir_list:
    try:
        # append to df_list and set your indices to match across your df's for later pd.concat to work
        df_list.append(pd.read_csv(file).set_index(['Country', 'Year']))
    except: # change this depending on whatever Errors pd.read_csv() throws
        pass
concatted = pd.concat(df_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将多个 csv 文件中的几列合并到一个 csv 文件中 - Merge several columns from multiple csv files to one csv file 如何将多个csv文件的选定列合并到一个数据框中? 朱皮特 - How can I join selected columns of multiple csv files into one data frame? Jupyter Pandas,将多个 csv 导入一个具有多列的数据框中 - Pandas, import multiple csv into one data frame with multiple columns 将多个csv文件合并为一个 - Merge multiple csv files into one Python csv合并具有不同列的多个文件 - Python csv merge multiple files with different columns 将多个 CSV 文件中的行合并到一个 CSV 文件中并保持相同的列数 - Merge rows from multiple CSV files into one CSV file and keep same number of columns 如何将多个 csv 文件合并到一个文件中,其中 pandas、python 上有特定列? - How to merge multiple csv files into one file with specific columns on pandas, python? 如何使用 python 或 jq 将多个具有统一列的 JSON 文件合并到一个 CSV 中? - How do I use python or jq to merge multiple JSON files with uniform columns into one CSV? 将多个csv文件读入Pandas数据框 - Reading multiple csv files into a Pandas Data Frame 如何将多个不同语言的 CSV 文件合并为一个 CSV 文件? - How to merge multiple CSV files with different languages into one CSV file?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM