如何在 Python 中基于另一个 Pandas dataframe 完成 NaN 单元格

Question

I have the following 2 dataframes..我有以下2个数据框..

First dataframe df1 :首先 dataframe df1 ：

import pandas as pd
import numpy as np

d1 = {'id': [1, 2, 3, 4], 'col1': [13, np.nan, 15, np.nan], 'col2': [23, np.nan, np.nan, np.nan]}
df1 = pd.DataFrame(data=d1)
df1

    id  col1    col2
0   1   13.0    23.0
1   2   NaN     NaN
2   3   15.0    NaN
3   4   NaN     NaN

And the second dataframe df2 :第二个 dataframe df2 ：

d2 = {'id': [2, 3, 4], 'col1': [ 14, 150, 16], 'col2': [24, 250, np.nan]}
df2 = pd.DataFrame(data=d2)
df2

    id  col1    col2
0   2   14      24.0
1   3   150     250.0
2   4   16      NaN

I need to replace the NaN fields in df1 with the non-NaN values from df2 , where it is possible.我需要将df1中的NaN字段替换为df2中的非 NaN值，如果可能的话。 But there are some conditions...但是有一些条件...

Condition 1) id column in each dataframe consists of unique values.条件 1)每个 dataframe 中的id列由唯一值组成。 When replacing any NaN value in df1 with another value from df2 , the id column value needs to match.将df1中的任何 NaN 值替换为df2中的另一个值时， id列值需要匹配。

Condition 2) Dataframes do not necessarily have the same size.条件 2)数据帧不一定具有相同的大小。

Condition 3) NaN values will only be looked for in col1 or col2 in any of the dataframes.条件 3) NaN 值只会在任何数据帧的col1或col2中查找。 The id column cannot be NaN in any row. id列在任何行中都不能是 NaN。 There might be other columns in the dataframes, with or without NaN values.数据框中可能还有其他列，有或没有 NaN 值。 But for replacing the data, we will only be looking at col1 and col2 columns.但是为了替换数据，我们只会查看col1和col2列。

Condition 4) To go for a replacement of a row in df1 , it is enough that any of col1 or col2 have a NaN value in any corresponding row.条件 4)到 go 替换df1中的一行， col1或col2中的任何一个在任何相应的行中都有一个 NaN 值就足够了。 And when any NaN value is detected in any row in df1 , the entire row will be replaced by the corresponding row with the same id value from df2 , as long as all values of col1 and col2 in the corresponding row of df2 are non-NaN .并且当在df1的任何行中检测到任何 NaN 值时，只要df2对应行中 col1 和 col2 的所有值都是非 NaN ，整行将被df2中具有相同id值的对应行替换. With other words, if the row with the same id value in df2 have NaN value in any of col1 or col2 , do not replace any data in df1 .换句话说，如果df2中具有相同 id 值的行在col1或col2中的任何一个中具有 NaN 值，则不要替换df1中的任何数据。

After doing this operation, the df1 should look like the following:执行此操作后， df1应如下所示：

    id  col1    col2
0   1   13.0    23.0
1   2   14      24    
2   3   150.0   250.0    # Note that the entire row is replaced!
3   4   NaN     NaN      # This row not replaced bcz col2 value is NaN in df2 for the same row

How can this be done in the most elegant way?如何以最优雅的方式做到这一点？ Python offers a lot of functions that I may not be aware of, which maybe solves this problem in a few rows instead of writing a very complex logic. Python 提供了很多我可能不知道的功能，这可能会在几行中解决这个问题，而不是编写非常复杂的逻辑。

Answer 1

You can drop the NaN values from df2 , then update with concat and groupby :您可以从df2中删除NaN值，然后使用concat和groupby进行更新：

pd.concat([df2.dropna(), df1]).groupby('id', as_index=False).first()

Output: Output：

   id   col1   col2
0   1   13.0   23.0
1   2   14.0   24.0
2   3  150.0  250.0
3   4    NaN    NaN

Answer 2

here is another way using fillna :这是使用fillna的另一种方式：

df1 = df1.set_index('id').fillna(df2.dropna().set_index('id')).reset_index()

output: output：

>>>
   id  col1   col2
0   1  13.0   23.0
1   2  14.0   24.0
2   3  15.0  250.0
3   4   NaN    NaN

如何在 Python 中基于另一个 Pandas dataframe 完成 NaN 单元格

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-09-01 19:57:26

解决方案2
1 2022-09-01 20:01:44

如何在 Python 中基于另一个 Pandas dataframe 完成 NaN 单元格

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-09-01 19:57:26

解决方案2 1 2022-09-01 20:01:44

解决方案1
1 已采纳 2022-09-01 19:57:26

解决方案2
1 2022-09-01 20:01:44