简体   繁体   English

如何根据条件合并两个不同大小的 Pandas DataFrames

[英]How to merge two Pandas DataFrames of different size based on condition

I have a primary df that I want to merge into.我有一个要合并的主 df。 Let's call it 'primary_df'.我们称它为“primary_df”。

RCID    TypeID    Data
 777         D    Hello
 777         O    Hey
 778         O    Hey
 779         D    Hello

primary_df contains an 'RCID' column that matches up with 'O_ID' in another dataframe that only has data of TypeID 'O'. primary_df 包含一个“RCID”列,该列与另一个 dataframe 中的“O_ID”相匹配,该 dataframe 仅具有类型 ID“O”的数据。 Let's call that df 'o_type_df'我们称它为 df 'o_type_df'

O_ID   O_Data
 777   Foo
 778   Bar

o_type_df has less entries than primary_df. o_type_df 的条目少于 primary_df。 There are repeat values of 'RCID' in primary_df since the same RCID can have different TypeIDs associated with it.在 primary_df 中有重复的 'RCID' 值,因为同一个 RCID 可以有不同的 TypeID 与之关联。

How can I merge o_type_df into primary_df for all rows of TypeID 'O'?对于 TypeID 'O' 的所有行,如何将 o_type_df 合并到 primary_df 中?

End result should be:最终结果应该是:

RCID    TypeID    Data     O_ID   O_Data
 777         D    Hello    
 777         O    Hey      777    Foo
 778         O    Hey      778    Bar
 779         D    Hello

Code:代码:

primary_df = pd.DataFrame(columns=['RCID', 'TypeID', 'Data'], data=[[777, 'D', 'Hello'], [777, 'O', 'Hey'], [778, 'O', 'Hey'], [779, 'D', 'Hello']])
o_type_df = pd.DataFrame(columns=['O_ID', 'O_Data'], data=[[777, 'Foo'], [778, 'Bar']])

Try adding an indicator column to o_type_df :尝试向o_type_df添加指示符列:

o_type_df['TypeID'] = 'O'

Then merge left on those columns:然后在这些列上向左merge

merged = (
    primary_df.merge(o_type_df,
                     left_on=['RCID', 'TypeID'],
                     right_on=['O_ID', 'TypeID'],
                     how='left')
)

merged : merged

   RCID TypeID   Data   O_ID O_Data
0   777      D  Hello    NaN    NaN
1   777      O    Hey  777.0    Foo
2   778      O    Hey  778.0    Bar
3   779      D  Hello    NaN    NaN

Or with assign :或使用assign

merged = (
    primary_df.merge(o_type_df.assign(TypeID='O'),
                     left_on=['RCID', 'TypeID'],
                     right_on=['O_ID', 'TypeID'],
                     how='left')
)

merged : merged

   RCID TypeID   Data   O_ID O_Data
0   777      D  Hello    NaN    NaN
1   777      O    Hey  777.0    Foo
2   778      O    Hey  778.0    Bar
3   779      D  Hello    NaN    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM