简体   繁体   English

如何在 Python 中基于另一个 Pandas dataframe 完成 NaN 单元格

[英]How to complete NaN cells based on another Pandas dataframe in Python

I have the following 2 dataframes..我有以下2个数据框..

First dataframe df1 :首先 dataframe df1

import pandas as pd
import numpy as np

d1 = {'id': [1, 2, 3, 4], 'col1': [13, np.nan, 15, np.nan], 'col2': [23, np.nan, np.nan, np.nan]}
df1 = pd.DataFrame(data=d1)
df1

    id  col1    col2
0   1   13.0    23.0
1   2   NaN     NaN
2   3   15.0    NaN
3   4   NaN     NaN

And the second dataframe df2 :第二个 dataframe df2

d2 = {'id': [2, 3, 4], 'col1': [ 14, 150, 16], 'col2': [24, 250, np.nan]}
df2 = pd.DataFrame(data=d2)
df2

    id  col1    col2
0   2   14      24.0
1   3   150     250.0
2   4   16      NaN

I need to replace the NaN fields in df1 with the non-NaN values from df2 , where it is possible.我需要将df1中的NaN字段替换为df2中的非 NaN值,如果可能的话。 But there are some conditions...但是有一些条件...

Condition 1) id column in each dataframe consists of unique values.条件 1)每个 dataframe 中的id列由唯一值组成。 When replacing any NaN value in df1 with another value from df2 , the id column value needs to match.df1中的任何 NaN 值替换为df2中的另一个值时, id列值需要匹配。

Condition 2) Dataframes do not necessarily have the same size.条件 2)数据帧不一定具有相同的大小。

Condition 3) NaN values will only be looked for in col1 or col2 in any of the dataframes.条件 3) NaN 值只会在任何数据帧的col1col2中查找。 The id column cannot be NaN in any row. id列在任何行中都不能是 NaN。 There might be other columns in the dataframes, with or without NaN values.数据框中可能还有其他列,有或没有 NaN 值。 But for replacing the data, we will only be looking at col1 and col2 columns.但是为了替换数据,我们只会查看col1col2列。

Condition 4) To go for a replacement of a row in df1 , it is enough that any of col1 or col2 have a NaN value in any corresponding row.条件 4)到 go 替换df1中的一行, col1col2中的任何一个在任何相应的行中都有一个 NaN 值就足够了。 And when any NaN value is detected in any row in df1 , the entire row will be replaced by the corresponding row with the same id value from df2 , as long as all values of col1 and col2 in the corresponding row of df2 are non-NaN .并且当在df1的任何行中检测到任何 NaN 值时,只要df2对应行中 col1 和 col2 的所有值都是非 NaN ,整行将被df2中具有相同id值的对应行替换. With other words, if the row with the same id value in df2 have NaN value in any of col1 or col2 , do not replace any data in df1 .换句话说,如果df2中具有相同 id 值的行在col1col2中的任何一个中具有 NaN 值,则不要替换df1中的任何数据。

After doing this operation, the df1 should look like the following:执行此操作后, df1应如下所示:

    id  col1    col2
0   1   13.0    23.0
1   2   14      24    
2   3   150.0   250.0    # Note that the entire row is replaced!
3   4   NaN     NaN      # This row not replaced bcz col2 value is NaN in df2 for the same row

How can this be done in the most elegant way?如何以最优雅的方式做到这一点? Python offers a lot of functions that I may not be aware of, which maybe solves this problem in a few rows instead of writing a very complex logic. Python 提供了很多我可能不知道的功能,这可能会在几行中解决这个问题,而不是编写非常复杂的逻辑。

You can drop the NaN values from df2 , then update with concat and groupby :您可以从df2中删除NaN值,然后使用concatgroupby进行更新:

pd.concat([df2.dropna(), df1]).groupby('id', as_index=False).first()

Output: Output:

   id   col1   col2
0   1   13.0   23.0
1   2   14.0   24.0
2   3  150.0  250.0
3   4    NaN    NaN

here is another way using fillna :这是使用fillna的另一种方式:

df1 = df1.set_index('id').fillna(df2.dropna().set_index('id')).reset_index()

output: output:

>>>
   id  col1   col2
0   1  13.0   23.0
1   2  14.0   24.0
2   3  15.0  250.0
3   4   NaN    NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据上一列填充 Pandas 数据框中的 NaN 单元格? - How to fill NaN cells in a pandas dataframe, based on the previous column? 如何基于另一列的NaN值设置熊猫数据框中的值? - How set values in pandas dataframe based on NaN values of another column? 如何根据 Python pandas DataFrame 中的单元格的值对不同的单元格进行着色 - How to color different cells based on the value of the cells in a Python pandas DataFrame 基于另一个 dataframe 中完成的值,完成 dataframe 中的 NaN 值 - Complete NaN values in a dataframe based on the values completed in another dataframe 根据另一个数据帧的索引和列,用 NaN 替换数据帧单元格 - Replace dataframe cells with NaN based on indexes and columns of another dataframe 根据 pandas 中另一个 dataframe 的掩码值在 dataframe 中制作 NaN - Make NaN in a dataframe based on mask value of another dataframe in pandas python - 在python pandas的数据帧上执行groupby时如何保留带有空(nan)单元格的行 - How to preserve rows with empty (nan) cells when doing groupby on a dataframe in python pandas 如何基于熊猫python中的另一个数据框获取数据框的子集 - How to get the subset of dataframe based on another dataframe in pandas python 如何循环通过 pandas dataframe 忽略 Nan 单元格? - How to loop through pandas dataframe ignoring Nan cells? 如何使用NaN将合并的Excel单元格读入Pandas DataFrame - How to read merged Excel cells with NaN into Pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM