[英]Merging dataframes with multiple columns in common
I have multiple csv files corresponding to the access grades to college in my country, divided by years.我有多个 csv 文件对应于我国大学的入学成绩,除以年。
Every csv file is composed of 7 columns:每个 csv 文件由 7 列组成:
"Institution Code", "Curse Code", "Institution", "Course", NrVacanciesYear", "NrPlacedYear", "LastGradeYear".
I'm trying to concatenate 21 files (from the year 1997 to 2018) into a single one, where it concatenates the columns by year.我正在尝试将 21 个文件(从 1997 年到 2018 年)连接到一个文件中,并按年份连接列。
I'm trying to use我正在尝试使用
dffinal_conc = pd.concat([df18, df17], ignore_index=True)
but I'm having problems grouping by "Institution Code" and "Course Code" (NOTE: same courses from different institution have the same ID!)但我在按“机构代码”和“课程代码”分组时遇到问题(注意:来自不同机构的相同课程具有相同的 ID!)
When I try to use this当我尝试使用它时
dffinal_conc = pd.concat([df18, df17], axis = 1)
it kinda groups by Course and Institution code but I don't know if I'm doing it properly, when I try to use它有点按课程和机构代码分组,但是当我尝试使用时,我不知道我是否做得正确
dffinal_conc['Código Curso'].value_counts()
to verify that there is only 1 value by ID, I get an error:要验证 ID 是否只有 1 个值,我收到一个错误:
AttributeError: 'DataFrame' object has no attribute 'value_counts'"
AttributeError: 'DataFrame' object 没有属性 'value_counts'"
(NOTE: I used dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()]
to remove duplicates) (注意:我使用
dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()]
删除重复项)
Thank you for your help!谢谢您的帮助!
put all your csv files into one folder and try this:将所有 csv 文件放入一个文件夹并尝试以下操作:
import os
import pandas as pd
import glob
temp = pd.DataFrame()
path_to_csv = '/folder/of/csvs/'
csv_pattern = os.path.join(path_to_csv,'*.csv')
file_list = glob.glob(csv_pattern)
for file in file_list:
df = pd.read_csv(file , header= True)
temp = temp.append(df, sort = True)
temp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.