[英]Merge multiple Excel files with varied rows into one Excel file in pandas
I have 4 Excel files that I have to merge into one Excel file. 我有4个Excel文件,必须将它们合并为一个Excel文件。 Demography file containing ID, Initials, Age, and Sex.
包含ID,姓名缩写,年龄和性别的人口统计文件。 Laboratory file containing ID, Initials Test name, Test date, and Test Value.
包含ID,首字母缩写的测试名称,测试日期和测试值的实验室文件。 Medical History containing ID, Initials, Medical condition, Start and Stop Dates.
病历,包含ID,姓名首字母,医疗状况,开始和结束日期。 Medication given containing ID, Initials, Drug name, dose, frequency, start and stop dates.
提供的药物包括ID,姓名缩写,药物名称,剂量,频率,开始和结束日期。
There are 50 patients. 有50位患者。 The demography file contains all 50 rows of 50 patients.
人口统计学文件包含50位患者的所有50行。 The rest of the files have 50 patients but between 100 to 400 rows because each patient has multiple lab tests or multiple drugs.
其余文件有50位患者,但行数在100到400之间,因为每位患者都有多个实验室测试或多种药物。
When I merge in pandas, I have duplicates or assignment of entities to wrong patients. 当我在大熊猫中合并时,我会将重复的实体或实体分配给错误的患者。 The challenge is to do this a way such that where you have a patient with more medications given than lab tests, the lab test should replace the duplicates with whitespaces.
这样做的挑战在于,如果您给患者提供的药物比实验室测试多,则实验室测试应将重复项替换为空白。
This is a shortened representation: 这是一个简短的表示形式:
import pandas as pd
lab = pd.read_excel('data/data.xlsx', sheetname='lab')
drugs = pd.read_excel('data/data.xlsx', sheetname='drugs')
merged_data = pd.merge(drugs, lab, on='ID', how='left')
merged_data.to_excel('merged_data.xls')
You get this result: Pandas merge result 您得到以下结果: Pandas合并结果
I would prefer this result: Prefered output 我希望这个结果:首选输出
Consider using cumcount()
on a groupby()
and then join on both that field with ID
: 考虑在
groupby()
cumcount()
上使用cumcount()
,然后在两个具有ID
字段上加入:
drugs['GrpCount'] = (drugs.groupby(['ID'])).cumcount()
lab['GrpCount'] = (lab.groupby(['ID'])).cumcount()
merged_data = pd.merge(drugs, lab, on=['ID', 'GrpCount'], how='left').drop(['GrpCount'], axis=1)
# ID Initials_x Drug Name Frequency Route Start Date End Date Initials_y Name Result Date Result
# 0 1 AB AMPICLOX NaN Oral 21-Jun-2016 21-Jun-2016 AB Rapid Diagnostic Test 30-May-16 Abnormal
# 1 1 AB CIPROFLOXACIN Daily Oral 30-May-2016 03-Jun-2016 AB Microscopy 30-May-16 Normal
# 2 1 AB Ibuprofen Tablet 400 mg Two Times a Day Oral 06-Oct-2016 10-Oct-2016 NaN NaN NaN NaN
# 3 1 AB COARTEM NaN Oral 17-Jun-2016 17-Jun-2016 NaN NaN NaN NaN
# 4 1 AB INJECTABLE ARTESUNATE 12 Hourly Intravenous 01-Jun-2016 02-Jun-2016 NaN NaN NaN NaN
# 5 1 AB COTRIMOXAZOLE Daily Oral 30-May-2016 12-Jun-2016 NaN NaN NaN NaN
# 6 1 AB METRONIDAZOLE Two Times a Day Oral 30-May-2016 03-Jun-2016 NaN NaN NaN NaN
# 7 2 SS GENTAMICIN Daily Intravenous 04-Jun-2016 04-Jun-2016 SS Microscopy 6-Jun-16 Abnormal
# 8 2 SS METRONIDAZOLE 8 Hourly Intravenous 04-Jun-2016 06-Jun-2016 SS Complete Blood Count 6-Oct-16 Recorded
# 9 2 SS Oral Rehydration Salts Powder PRN Oral 06-Jun-2016 06-Jun-2016 NaN NaN NaN NaN
# 10 2 SS ZINC 8 Hourly Oral 06-Jun-2016 06-Jun-2016 NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.