简体   繁体   English

在唯一ID上连接两个数据框,但如果id不存在,则使用另一个值

[英]Joining two dataframes on unique ID, but using another value if id doesn't exist

I have two dataframes as such: 我有两个这样的数据框:

UID    mainColumn .... (other columns of data)
1      apple
2      orange
3      apple
4      orange
5      berry
....

UID2   mainColumn2
1      truck
3      car
4      boat
5      plane
...

I need to join the second dataframe onto dataframe based on UID, however if df2 does not contain a uid, then the maincolumn value is the one I'd to use. 我需要将第二个数据框加入基于UID的数据框,但是,如果df2不包含uid,则maincolumn值就是我要使用的值。 In the above example, UID2 does not contain the value 2, so the final table would look something like 在上面的示例中,UID2不包含值2,因此最终表看起来像

UID    mainColumn ....
1      truck
2      orange
3      car
4      boat
5      plane
...

Now I'm aware we can do something in the form of 现在我知道我们可以以

df1=df1.merge(df2,left_on='UID', right_on='UID2')

But the issue I have is not replacing the missing values, and making sure they are still included. 但我遇到的问题不是替换丢失的值,并确保仍将其包括在内。 Thanks! 谢谢!

You can use combine_first() after renaming the columns of df2 as df1 (eg UID2 to UID..) : 在将df2的列重命名为df1之后,可以使用combine_first() (例如,将UID2更改为UID ..):

df2.columns=df1.columns#be careful, rename only matching columns
final_df=df2.set_index('UID').combine_first(df1.set_index('UID')).reset_index()

  UID mainColumn
0    1      truck
1    2     orange
2    3        car
3    4       boat
4    5      plane

We can first use merge , then fillna the missing values and finally drop the extra column: 我们可以先使用merge ,然后fillna缺失的值,最后drop多余的列:

final = df1.merge(df2, left_on='UID', right_on='UID2', how='left').drop('UID2', axis=1)

final['mainColumn'] = final['mainColumn2'].fillna(final['mainColumn'])

final.drop('mainColumn2', axis=1, inplace=True)

   UID mainColumn
0    1      truck
1    2     orange
2    3        car
3    4       boat
4    5      plane

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何每次创建一个唯一的字母数字 id 并确保它不存在于我使用 python 的现有列表中? - How to create a unique alphanumeric id each time and ensure it doesn't exist in my existing list using python? 在 Pandas 中,如何测试给定唯一 ID 的两个数据帧中是否存在一个值? - In Pandas how can I test whether a value exists in two dataframes given a unique ID? 在 Pandas 中加入两个数据帧,从另一个 dataframe 中删除值 - Joining two dataframes in Pandas remove value from another dataframe 如何通过比较两个数据帧的唯一ID来创建新列? - how to make a new column based on comparing two dataframes' unique id? 合并两个带有 id 的数据帧 - Merge two dataframes with id 使用 vaex 加入两个数据帧 - Joining two dataframes using vaex Pyspark:按 ID 和最近日期向后加入 2 个数据框 - Pyspark: Joining 2 dataframes by ID & Closest date backwards 使用 PySpark 加入两个数据帧。 我在单独的 DF 中有一个 unique_id 和一个 non_unique_id 列。 如何通过 unique_id 过滤非唯一列? - Using PySpark join on two dataframes. I have one unique_id and one non_unique_id column in separate DF. How to filter non-unique column by unique_id? 通过 ID 将两个数据框与 Pandas 合并 - Merge two dataframes with pandas by ID 连接两个没有完全匹配值的数据框 - Joining two dataframes without exactly match value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM