I have multiple CSV files (like 200) in a folder that I want to merge them into one unique dataframe. For example, each file has 3 columns, of which 2 are common in all the files ( Country
and Year
), the third column is different in each file.
For example, one file has the following columns:
Country Year X
----------------------
Mexico 2015 10
Spain 2014 6
And other file can be like this:
Country Year A
--------------------
Mexico 2015 90
Spain 2014 67
USA 2020 8
I can read this files and merge them with the following code:
x = pd.read_csv("x.csv")
a = pd.read_csv("a.csv")
df = pd.merge(a, x, how="left", left_on=["country", "year"],
right_on=["country", "year"], indicator=False)
And this result in the output that I want, like this:
Country Year A X
-------------------------
Mexico 2015 90 10
Spain 2014 67 6
USA 2020 8
However, my problem is to do the previously process with each file, there are more than 200, I want to know if I can use a loop (or other method) in order to read the files and merge them into a unique dataframe.
Thank you very much, I hope I was clear enough.
Use glob like this:
import glob
print(glob.glob("/home/folder/*.csv"))
This gives all your files in a list : ['/home/folder/file1.csv', '/home/folder/file2.csv', .... ]
Now, you can just iterate over this list : from 1->end, keeping 0 as your base
, and do pd.read_csv()
and pd.merge()
- it should be sorted!
Try this:
import os
import pandas as pd
# update this to path that contains your .csv's
path = '.'
# get files that end with csv in path
dir_list = [file for file in os.listdir(path) if file.endswith('.csv')]
# initiate empty list
df_list = []
# simple for loop with Try, Except that passes on iterations that throw errors when trying to 'read_csv' your files
for file in dir_list:
try:
# append to df_list and set your indices to match across your df's for later pd.concat to work
df_list.append(pd.read_csv(file).set_index(['Country', 'Year']))
except: # change this depending on whatever Errors pd.read_csv() throws
pass
concatted = pd.concat(df_list)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.