[英]How to rename columns while reading multiple files using pandas
I have two data frames (to excel files) with the below columns我有两个数据框(到 excel 文件)与以下列
File 1- columns文件 1- 列
person_ID Test_CODE REGISTRATION_DATE subject_CD subject_DESCRIPTION subject_TYPE
File 2- columns文件 2- 列
person_ID Test_CODE REGISTRATION_DATE subject_Code subject_DESCRIPTION subject_Indicator
However, the columns subject_CD
and subject_Code
mean the same.但是,
subject_CD
和subject_Code
列的含义相同。 Similarly, subject_TYPE
and subject_Indicator
mean the same.同样,
subject_TYPE
和subject_Indicator
的含义相同。 So, I would like to rename them when I read the excel file所以,我想在阅读 excel 文件时重命名它们
I tried the below but it doesn't work我尝试了以下但它不起作用
dfs = []
for f in files:
df = pd.read_excel(f, sep=",",low_memory=False)
print(df.columns)
df1 = df[df.columns.intersection(['person_ID','Test_CODE','REGISTRATION_DATE','subject_CD','subject_DESCRIPTION','subject_TYPE'])].rename(columns={'subject_TYPE':'subject_Indicator','subject_CD':'subject_Code'})
dfs.append(df1)
Since, I would like to append/merge both the files, I expect the column names in my final data frame to be like as shown below因为,我想追加/合并这两个文件,我希望我的最终数据框中的列名如下所示
person_ID Test_CODE REGISTRATION_DATE subject_Code subject_DESCRIPTION subject_Indicator
Can help me with this?可以帮我解决这个问题吗?
If you want to retain the columns of the first file which is read you can do something like this which stores the columns of the first iteration and assigns the column to the rest of the files:如果要保留读取的第一个文件的列,可以执行以下操作,存储第一次迭代的列并将列分配给文件的 rest:
dfs = []
for e,f in enumerate(files):
df = pd.read_excel(f)
print(df.columns)
if e == 0:
col = df.columns
df.columns=col
dfs.append(df)
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object')
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_Code',
'subject_DESCRIPTION', 'subject_Indicator'],
dtype='object')
[df.columns for df in dfs] #pd.concat(dfs)
[Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object'),
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.