[英]How to merge two Pandas DataFrames of different size based on condition
I have a primary df that I want to merge into.我有一个要合并的主 df。 Let's call it 'primary_df'.
我们称它为“primary_df”。
RCID TypeID Data
777 D Hello
777 O Hey
778 O Hey
779 D Hello
primary_df contains an 'RCID' column that matches up with 'O_ID' in another dataframe that only has data of TypeID 'O'. primary_df 包含一个“RCID”列,该列与另一个 dataframe 中的“O_ID”相匹配,该 dataframe 仅具有类型 ID“O”的数据。 Let's call that df 'o_type_df'
我们称它为 df 'o_type_df'
O_ID O_Data
777 Foo
778 Bar
o_type_df has less entries than primary_df. o_type_df 的条目少于 primary_df。 There are repeat values of 'RCID' in primary_df since the same RCID can have different TypeIDs associated with it.
在 primary_df 中有重复的 'RCID' 值,因为同一个 RCID 可以有不同的 TypeID 与之关联。
How can I merge o_type_df into primary_df for all rows of TypeID 'O'?对于 TypeID 'O' 的所有行,如何将 o_type_df 合并到 primary_df 中?
End result should be:最终结果应该是:
RCID TypeID Data O_ID O_Data
777 D Hello
777 O Hey 777 Foo
778 O Hey 778 Bar
779 D Hello
Code:代码:
primary_df = pd.DataFrame(columns=['RCID', 'TypeID', 'Data'], data=[[777, 'D', 'Hello'], [777, 'O', 'Hey'], [778, 'O', 'Hey'], [779, 'D', 'Hello']])
o_type_df = pd.DataFrame(columns=['O_ID', 'O_Data'], data=[[777, 'Foo'], [778, 'Bar']])
Try adding an indicator column to o_type_df
:尝试向
o_type_df
添加指示符列:
o_type_df['TypeID'] = 'O'
Then merge
left on those columns:然后在这些列上向左
merge
:
merged = (
primary_df.merge(o_type_df,
left_on=['RCID', 'TypeID'],
right_on=['O_ID', 'TypeID'],
how='left')
)
merged
: merged
:
RCID TypeID Data O_ID O_Data
0 777 D Hello NaN NaN
1 777 O Hey 777.0 Foo
2 778 O Hey 778.0 Bar
3 779 D Hello NaN NaN
merged = (
primary_df.merge(o_type_df.assign(TypeID='O'),
left_on=['RCID', 'TypeID'],
right_on=['O_ID', 'TypeID'],
how='left')
)
merged
: merged
:
RCID TypeID Data O_ID O_Data
0 777 D Hello NaN NaN
1 777 O Hey 777.0 Foo
2 778 O Hey 778.0 Bar
3 779 D Hello NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.