[英]pandas function to fill missing values from other dataframe based on matching column?
So I have two dataframes: one where certain columns are filled in and one where others are filled in but some from the previous df are missing. 所以我有两个数据帧:一个填充某些列,另一个填充其他列,但是前一个df中的一些缺失。 Both share some common non-empty columns. 两者共享一些常见的非空列。
DF1:
FirstName Uid JoinDate BirthDate
Bob 1 20160628 NaN
Charlie 3 20160627 NaN
DF2:
FirstName Uid JoinDate BirthDate
Bob 1 NaN 19910524
Alice 2 NaN 19950403
Result:
FirstName Uid JoinDate BirthDate
Bob 1 20160628 19910524
Alice 2 NaN 19950403
Charlie 3 20160627 NaN
Assuming that these rows do not share index positions in their respective dataframes, is there a way that I can fill the missing values in DF1 with values from DF2 where the rows match on a certain column (in this example Uid)? 假设这些行不在它们各自的数据帧中共享索引位置,是否有一种方法可以用DF2中的值填充DF1中的缺失值,其中行匹配某个列(在此示例中为Uid)?
Also, is there a way to create a new entry in DF1 from DF2 if there isn't a match on that column (eg Uid) without removing rows in DF1 that don't match any rows in DF2? 此外,如果在该列上没有匹配(例如Uid)而没有删除DF1中与DF2中的任何行不匹配的行,是否有办法在DF1中创建DF1中的新条目?
EDIT: I updated the dataframes to add non-matching results in both dataframes that I need in the result df. 编辑:我更新了数据帧,在结果df中需要的两个数据帧中添加不匹配的结果。 I also updated my last question to reflect that. 我还更新了我的上一个问题以反映这一点。
UPDATE: you can do it setting the proper indices and finally resetting the index of joined DF: 更新:你可以设置正确的索引,最后重置加入DF的索引:
In [14]: df1.set_index('FirstName').combine_first(df2.set_index('FirstName')).reset_index()
Out[14]:
FirstName Uid JoinDate BirthDate
0 Alice 2.0 NaN 19950403.0
1 Bob 1.0 20160628.0 19910524.0
2 Charlie 3.0 20160627.0 NaN
try this: 试试这个:
In [113]: df2.combine_first(df1)
Out[113]:
FirstName Uid JoinDate BirthDate
0 Bob 1 20160628.0 19910524
1 Alice 2 NaN 19950403
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.