简体   繁体   English

将具有不同行的多个Excel文件合并到一个熊猫中的Excel文件中

[英]Merge multiple Excel files with varied rows into one Excel file in pandas

I have 4 Excel files that I have to merge into one Excel file. 我有4个Excel文件,必须将它们合并为一个Excel文件。 Demography file containing ID, Initials, Age, and Sex. 包含ID,姓名缩写,年龄和性别的人口统计文件。 Laboratory file containing ID, Initials Test name, Test date, and Test Value. 包含ID,首字母缩写的测试名称,测试日期和测试值的实验室文件。 Medical History containing ID, Initials, Medical condition, Start and Stop Dates. 病历,包含ID,姓名首字母,医疗状况,开始和结束日期。 Medication given containing ID, Initials, Drug name, dose, frequency, start and stop dates. 提供的药物包括ID,姓名缩写,药物名称,剂量,频率,开始和结束日期。

There are 50 patients. 有50位患者。 The demography file contains all 50 rows of 50 patients. 人口统计学文件包含50位患者的所有50行。 The rest of the files have 50 patients but between 100 to 400 rows because each patient has multiple lab tests or multiple drugs. 其余文件有50位患者,但行数在100到400之间,因为每位患者都有多个实验室测试或多种药物。

When I merge in pandas, I have duplicates or assignment of entities to wrong patients. 当我在大熊猫中合并时,我会将重复的实体或实体分配给错误的患者。 The challenge is to do this a way such that where you have a patient with more medications given than lab tests, the lab test should replace the duplicates with whitespaces. 这样做的挑战在于,如果您给患者提供的药物比实验室测试多,则实验室测试应将重复项替换为空白。

This is a shortened representation: 这是一个简短的表示形式:

import pandas as pd 
lab = pd.read_excel('data/data.xlsx', sheetname='lab') 
drugs = pd.read_excel('data/data.xlsx', sheetname='drugs') 
merged_data = pd.merge(drugs, lab, on='ID', how='left')
merged_data.to_excel('merged_data.xls')

You get this result: Pandas merge result 您得到以下结果: Pandas合并结果

I would prefer this result: Prefered output 我希望这个结果:首选输出

Consider using cumcount() on a groupby() and then join on both that field with ID : 考虑在groupby() cumcount()上使用cumcount() ,然后在两个具有ID字段上加入:

drugs['GrpCount'] = (drugs.groupby(['ID'])).cumcount()

lab['GrpCount'] = (lab.groupby(['ID'])).cumcount()

merged_data = pd.merge(drugs, lab, on=['ID', 'GrpCount'], how='left').drop(['GrpCount'], axis=1)

#     ID Initials_x                      Drug Name          Frequency          Route   Start Date     End Date Initials_y                    Name Result Date    Result
# 0    1         AB                       AMPICLOX                NaN           Oral  21-Jun-2016  21-Jun-2016         AB  Rapid Diagnostic Test    30-May-16  Abnormal
# 1    1         AB                  CIPROFLOXACIN              Daily           Oral  30-May-2016  03-Jun-2016         AB              Microscopy   30-May-16    Normal
# 2    1         AB        Ibuprofen Tablet 400 mg    Two Times a Day           Oral  06-Oct-2016  10-Oct-2016        NaN                     NaN         NaN       NaN
# 3    1         AB                        COARTEM                NaN           Oral  17-Jun-2016  17-Jun-2016        NaN                     NaN         NaN       NaN
# 4    1         AB          INJECTABLE ARTESUNATE          12 Hourly    Intravenous  01-Jun-2016  02-Jun-2016        NaN                     NaN         NaN       NaN
# 5    1         AB                  COTRIMOXAZOLE              Daily           Oral  30-May-2016  12-Jun-2016        NaN                     NaN         NaN       NaN
# 6    1         AB                  METRONIDAZOLE    Two Times a Day           Oral  30-May-2016  03-Jun-2016        NaN                     NaN         NaN       NaN
# 7    2         SS                     GENTAMICIN              Daily    Intravenous  04-Jun-2016  04-Jun-2016         SS              Microscopy    6-Jun-16  Abnormal
# 8    2         SS                  METRONIDAZOLE           8 Hourly    Intravenous  04-Jun-2016  06-Jun-2016         SS    Complete Blood Count    6-Oct-16  Recorded
# 9    2         SS  Oral Rehydration Salts Powder                PRN           Oral  06-Jun-2016  06-Jun-2016        NaN                     NaN         NaN       NaN
# 10   2         SS                           ZINC           8 Hourly           Oral  06-Jun-2016  06-Jun-2016        NaN                     NaN         NaN       NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM