[英]Merging on non-unique column - pandas python
I have been trying to merge two DataFrames
together ( df
and df_details
) in a similar fashion to an Excel "vlookup" but am getting strange results.我一直在尝试以类似于 Excel“
DataFrames
方式将两个DataFrames
合并在一起( df
和df_details
),但得到了奇怪的结果。 Below I show the structure of the two DataFrames
without populating real data for simplicity下面我展示了两个
DataFrames
的结构,为简单起见,没有填充真实数据
df_details:
Abstract_Title | Abstract_URL | Session_No_v2 | Session_URL | Session_ID
-------------------------------------------------------------------------
Abstract_Title1 Abstract_URL1 1 Session_URL1 12345
Abstract_Title2 Abstract_URL2 1 Session_URL1 12345
Abstract_Title3 Abstract_URL3 1 Session_URL1 12345
Abstract_Title4 Abstract_URL4 2 Session_URL2 22222
Abstract_Title5 Abstract_URL5 2 Session_URL2 22222
Abstract_Title6 Abstract_URL6 3 Session_URL3 98765
Abstract_Title7 Abstract_URL7 3 Session_URL3 98765
df:
Session_Title | Session_URL | Sponsors | Type | Session_ID
-------------------------------------------------------------------------------
Session_Title1 Session_URL1 x, y z Paper 12345
Session_Title2 Session_URL2 x, y Presentation 22222
Session_Title3 Session_URL3 a, b ,c Presentation 98765
Session_Title4 Session_URL4 c Talk 12121
Session_Title5 Session_URL5 a, x Paper 33333
I want to merge along Session_ID
and I want the final DataFrame
to look like:我想沿着
Session_ID
合并,我希望最终的DataFrame
看起来像:
I've tried the following script which yields a DataFrame
that duplicates (several times) certain rows and does strange things.我尝试了以下脚本,该脚本生成一个重复(多次)某些行并执行奇怪操作的
DataFrame
。 For example, df_details
has 7,046 rows and df
has 1,856 rows - when I run the following merge code, my final_df
results in 21,148 rows:例如,
df_details
有 7,046 行, df
有 1,856 行 - 当我运行以下合并代码时,我的final_df
结果为 21,148 行:
final_df = pd.merge(df_details, df, how = 'outer', on = 'Session_ID')
Please help!请帮忙!
To generate your final output table I used the following code:为了生成最终的输出表,我使用了以下代码:
final_df = pd.merge(df_details, df[['Session_ID',
'Session_Title',
'Sponsors',
'Type']], left_on = ['Session_ID'], right_on = ['Session_ID'], how = 'outer')
使用“左”而不是“外”。
final_df = pd.merge(df_details, df[['Session_ID','Session_Title','Sponsors','Type']], left_on = ['Session_ID'], right_on =['Session_ID'], how = 'left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.