合并具有多个共同列的数据框

Question

I have multiple csv files corresponding to the access grades to college in my country, divided by years.我有多个 csv 文件对应于我国大学的入学成绩，除以年。

Every csv file is composed of 7 columns:每个 csv 文件由 7 列组成：

"Institution Code", "Curse Code", "Institution", "Course", NrVacanciesYear", "NrPlacedYear", "LastGradeYear".

I'm trying to concatenate 21 files (from the year 1997 to 2018) into a single one, where it concatenates the columns by year.我正在尝试将 21 个文件（从 1997 年到 2018 年）连接到一个文件中，并按年份连接列。

I'm trying to use我正在尝试使用

dffinal_conc = pd.concat([df18, df17], ignore_index=True)

but I'm having problems grouping by "Institution Code" and "Course Code" (NOTE: same courses from different institution have the same ID!)但我在按“机构代码”和“课程代码”分组时遇到问题（注意：来自不同机构的相同课程具有相同的 ID！）

When I try to use this当我尝试使用它时

dffinal_conc = pd.concat([df18, df17], axis = 1)

it kinda groups by Course and Institution code but I don't know if I'm doing it properly, when I try to use它有点按课程和机构代码分组，但是当我尝试使用时，我不知道我是否做得正确

dffinal_conc['Código Curso'].value_counts()

to verify that there is only 1 value by ID, I get an error:要验证 ID 是否只有 1 个值，我收到一个错误：

AttributeError: 'DataFrame' object has no attribute 'value_counts'" AttributeError: 'DataFrame' object 没有属性 'value_counts'"

(NOTE: I used dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()] to remove duplicates) （注意：我使用dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()]删除重复项）

Thank you for your help!谢谢您的帮助！

Answer 1

put all your csv files into one folder and try this:将所有 csv 文件放入一个文件夹并尝试以下操作：

import os
import pandas as pd
import glob


temp = pd.DataFrame()

path_to_csv = '/folder/of/csvs/' 

csv_pattern = os.path.join(path_to_csv,'*.csv')
file_list = glob.glob(csv_pattern)

for file in file_list:
    df = pd.read_csv(file , header= True)
    temp = temp.append(df, sort = True)

temp

合并具有多个共同列的数据框

问题描述

1 个解决方案

解决方案1
0 2020-04-18 13:42:52

合并具有多个共同列的数据框

问题描述

1 个解决方案

解决方案1 0 2020-04-18 13:42:52

解决方案1
0 2020-04-18 13:42:52