I have multiple csv files corresponding to the access grades to college in my country, divided by years.
Every csv file is composed of 7 columns:
"Institution Code", "Curse Code", "Institution", "Course", NrVacanciesYear", "NrPlacedYear", "LastGradeYear".
I'm trying to concatenate 21 files (from the year 1997 to 2018) into a single one, where it concatenates the columns by year.
I'm trying to use
dffinal_conc = pd.concat([df18, df17], ignore_index=True)
but I'm having problems grouping by "Institution Code" and "Course Code" (NOTE: same courses from different institution have the same ID!)
When I try to use this
dffinal_conc = pd.concat([df18, df17], axis = 1)
it kinda groups by Course and Institution code but I don't know if I'm doing it properly, when I try to use
dffinal_conc['Código Curso'].value_counts()
to verify that there is only 1 value by ID, I get an error:
AttributeError: 'DataFrame' object has no attribute 'value_counts'"
(NOTE: I used dffinal_conc2.loc[:,~dffinal_conc2.columns.duplicated()]
to remove duplicates)
Thank you for your help!
put all your csv files into one folder and try this:
import os
import pandas as pd
import glob
temp = pd.DataFrame()
path_to_csv = '/folder/of/csvs/'
csv_pattern = os.path.join(path_to_csv,'*.csv')
file_list = glob.glob(csv_pattern)
for file in file_list:
df = pd.read_csv(file , header= True)
temp = temp.append(df, sort = True)
temp
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.