简体   繁体   中英

Combining multiple CSV files with case sensitive column names with Python

I have multiple files in a folder with same column details. However for some of the files column name is in lower case, while for the rest column name is in upper case.

I'm using the below code to concat them in one file

path = r'folder'
file = glob.glob(os.path.join(path, 'Add', "*.csv"))
data = pd.concat((pd.read_csv(f, sep=',', encoding='latin-1') for f in file), ignore_index=True, sort=True)
data['Period'] = '202007' #Period Column is required as string

Individual files has 40 columns, but when I'm adding all the files through 'pd.concat' I'm getting 81 columns (40 in upper case + 40 in lower case + 1 created column).

I need final output as 41 columns - 40 columns either in upper/lower case + 1 created column

Thanks to Sid for the concat help.

Update (data types): I have different data types (int, float, object) in my data.

Try to convert your column names all to lowercase before concatenating your dataframes:

df.columns = df.columns.str.lower()

You should also unify your data types. For this, have a look on astype

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM