简体   繁体   English

合并具有多个共同列的数据框

[英]Merging dataframes with multiple columns in common

I have multiple csv files corresponding to the access grades to college in my country, divided by years.我有多个 csv 文件对应于我国大学的入学成绩,除以年。

Every csv file is composed of 7 columns:每个 csv 文件由 7 列组成:

"Institution Code", "Curse Code", "Institution", "Course", NrVacanciesYear", "NrPlacedYear", "LastGradeYear".

I'm trying to concatenate 21 files (from the year 1997 to 2018) into a single one, where it concatenates the columns by year.我正在尝试将 21 个文件(从 1997 年到 2018 年)连接到一个文件中,并按年份连接列。

I'm trying to use我正在尝试使用

dffinal_conc = pd.concat([df18, df17], ignore_index=True)

but I'm having problems grouping by "Institution Code" and "Course Code" (NOTE: same courses from different institution have the same ID!)但我在按“机构代码”和“课程代码”分组时遇到问题(注意:来自不同机构的相同课程具有相同的 ID!)

When I try to use this当我尝试使用它时

dffinal_conc = pd.concat([df18, df17], axis = 1) 

it kinda groups by Course and Institution code but I don't know if I'm doing it properly, when I try to use它有点按课程和机构代码分组,但是当我尝试使用时,我不知道我是否做得正确

dffinal_conc['Código Curso'].value_counts() 

to verify that there is only 1 value by ID, I get an error:要验证 ID 是否只有 1 个值,我收到一个错误:

AttributeError: 'DataFrame' object has no attribute 'value_counts'" AttributeError: 'DataFrame' object 没有属性 'value_counts'"

(NOTE: I used dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()] to remove duplicates) (注意:我使用dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()]删除重复项)

Thank you for your help!谢谢您的帮助!

put all your csv files into one folder and try this:将所有 csv 文件放入一个文件夹并尝试以下操作:

import os
import pandas as pd
import glob


temp = pd.DataFrame()

path_to_csv = '/folder/of/csvs/' 

csv_pattern = os.path.join(path_to_csv,'*.csv')
file_list = glob.glob(csv_pattern)

for file in file_list:
    df = pd.read_csv(file , header= True)
    temp = temp.append(df, sort = True)

temp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM