简体   繁体   English

大熊猫:从其他数据框中填充缺失的数据框值

[英]Pandas: Fill missing dataframe values from other dataframe

I have two dataframes of different size: 我有两个大小不同的数据框:

df1 = pd.DataFrame({'A':[1,2,None,4,None,6,7,8,None,10], 'B':[11,12,13,14,15,16,17,18,19,20]})
df1

      A   B
0   1.0  11
1   2.0  12
2   NaN  13
3   4.0  14
4   NaN  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

df2 = pd.DataFrame({'A':[2,3,4,5,6,8], 'B':[12,13,14,15,16,18]})
df2['A'] = df2['A'].astype(float)
df2

     A   B
0  2.0  12
1  3.0  13
2  4.0  14
3  5.0  15
4  6.0  16
5  8.0  18

I need to fill missing values (and only them) in column A of the first dataframe with values from the second dataframe with common key in the column B. It is equivalent to a SQL query: 我需要用第二个数据帧中的值(在B列中具有公共键)填充第一个数据帧的A列中的缺失值(并且只有它们)。这等效于SQL查询:

UPDATE df1 JOIN df2
  ON df1.B = df2.B
  SET df1.A = df2.A WHERE df1.A IS NULL;

I tried to use answers to similar questions from this site, but it does not work as I need: 我试图使用此站点上类似问题的答案,但由于我的需要而无法正常工作:

df1.fillna(df2)

      A   B
0   1.0  11
1   2.0  12
2   4.0  13
3   4.0  14
4   6.0  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

df1.combine_first(df2)

      A   B
0   1.0  11
1   2.0  12
2   4.0  13
3   4.0  14
4   6.0  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

Intended output is: 预期的输出为:

      A   B
0   1.0  11
1   2.0  12
2   3.0  13
3   4.0  14
4   5.0  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

How do I get this result? 我如何得到这个结果?

您正确使用combine_first() ,只是两个数据帧必须共享相同的索引,并且索引必须是列B:

df1.set_index('B').combine_first(df2.set_index('B')).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM