简体   繁体   中英

Merging dataframes with multiple columns in common

I have multiple csv files corresponding to the access grades to college in my country, divided by years.

Every csv file is composed of 7 columns:

"Institution Code", "Curse Code", "Institution", "Course", NrVacanciesYear", "NrPlacedYear", "LastGradeYear".

I'm trying to concatenate 21 files (from the year 1997 to 2018) into a single one, where it concatenates the columns by year.

I'm trying to use

dffinal_conc = pd.concat([df18, df17], ignore_index=True)

but I'm having problems grouping by "Institution Code" and "Course Code" (NOTE: same courses from different institution have the same ID!)

When I try to use this

dffinal_conc = pd.concat([df18, df17], axis = 1) 

it kinda groups by Course and Institution code but I don't know if I'm doing it properly, when I try to use

dffinal_conc['Código Curso'].value_counts() 

to verify that there is only 1 value by ID, I get an error:

AttributeError: 'DataFrame' object has no attribute 'value_counts'"

(NOTE: I used dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()] to remove duplicates)

Thank you for your help!

put all your csv files into one folder and try this:

import os
import pandas as pd
import glob


temp = pd.DataFrame()

path_to_csv = '/folder/of/csvs/' 

csv_pattern = os.path.join(path_to_csv,'*.csv')
file_list = glob.glob(csv_pattern)

for file in file_list:
    df = pd.read_csv(file , header= True)
    temp = temp.append(df, sort = True)

temp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM