简体   繁体   English

如何在使用 pandas 读取多个文件时重命名列

[英]How to rename columns while reading multiple files using pandas

I have two data frames (to excel files) with the below columns我有两个数据框(到 excel 文件)与以下列

File 1- columns文件 1- 列

person_ID   Test_CODE   REGISTRATION_DATE   subject_CD   subject_DESCRIPTION    subject_TYPE

File 2- columns文件 2- 列

person_ID   Test_CODE   REGISTRATION_DATE   subject_Code subject_DESCRIPTION    subject_Indicator

However, the columns subject_CD and subject_Code mean the same.但是, subject_CDsubject_Code列的含义相同。 Similarly, subject_TYPE and subject_Indicator mean the same.同样, subject_TYPEsubject_Indicator的含义相同。 So, I would like to rename them when I read the excel file所以,我想在阅读 excel 文件时重命名它们

I tried the below but it doesn't work我尝试了以下但它不起作用

dfs = []       
for f in files:
    df = pd.read_excel(f, sep=",",low_memory=False)
    print(df.columns)
    df1 = df[df.columns.intersection(['person_ID','Test_CODE','REGISTRATION_DATE','subject_CD','subject_DESCRIPTION','subject_TYPE'])].rename(columns={'subject_TYPE':'subject_Indicator','subject_CD':'subject_Code'})
    dfs.append(df1)

Since, I would like to append/merge both the files, I expect the column names in my final data frame to be like as shown below因为,我想追加/合并这两个文件,我希望我的最终数据框中的列名如下所示

person_ID   Test_CODE   REGISTRATION_DATE   subject_Code subject_DESCRIPTION subject_Indicator

Can help me with this?可以帮我解决这个问题吗?

If you want to retain the columns of the first file which is read you can do something like this which stores the columns of the first iteration and assigns the column to the rest of the files:如果要保留读取的第一个文件的列,可以执行以下操作,存储第一次迭代的列并将列分配给文件的 rest:

dfs = []       
for e,f in enumerate(files):
    df = pd.read_excel(f)
    print(df.columns)
    if e == 0:
        col = df.columns
    df.columns=col
    dfs.append(df)


Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
       'subject_DESCRIPTION', 'subject_TYPE'],
      dtype='object')
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_Code',
       'subject_DESCRIPTION', 'subject_Indicator'],
      dtype='object')

[df.columns for df in dfs] #pd.concat(dfs)

[Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
        'subject_DESCRIPTION', 'subject_TYPE'],
       dtype='object'),
 Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
        'subject_DESCRIPTION', 'subject_TYPE'],
       dtype='object')]

Rename 2 columns from particular df:从特定 df 重命名 2 列:

 df.rename({"subject_CD": "subject_Code", "subject_TYPE": "subject_Indicator"}, axis='columns', inplace =True) 

Also You can connect df1 and df2 - on the same columns:您还可以连接 df1 和 df2 - 在同一列上:

在此处输入图像描述

frames = [df1, df2]
result = pd.concat(frames)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM