简体   繁体   English

使用基于(非唯一)列值的其他行中的值替换 DataFrame 行中的 NaN 值

[英]Replacing NaN values in a DataFrame row with values from other rows based on a (non-unique) column value

I have a DataFrame similar to the following where I have a column with a non-unique value (in this case address) as well as some other columns containing information about it.我有一个类似于以下内容的 DataFrame,其中我有一列具有非唯一值(在本例中为地址)以及其他一些包含有关它的信息的列。

df = pd.DataFrame({'address': {0:'11 Star Street', 1:'22 Milky Way', 2:'88 Dark Drive', 3:'33 Planet Place', 4:'22 Milky Way', 5:'22 Milky Way'}, 'val': {0:10, 1:'', 2:'', 3:20, 4: 20, 5:''}, 'val2': {0:20, 1:'', 2:'', 3:40, 4:10, 5:''}})

           address val val2
0   11 Star Street  10   20
1     22 Milky Way         
2    88 Dark Drive         
3  33 Planet Place  20   40
4     22 Milky Way  20   10
5     22 Milky Way          

Some of the addresses appear more than once in the DataFrame and some of those repeated ones are missing information.一些地址在 DataFrame 中出现不止一次,而其中一些重复的地址缺少信息。 If a certain row is missing the values, but that address appears in another row in the DataFrame, I'd like to replace the NaN values with those from the same address to get something like this:如果某行缺少值,但该地址出现在 DataFrame 的另一行中,我想用来自同一地址的值替换 NaN 值以获得如下内容:

           address val val2
0   11 Star Street  10   20
1     22 Milky Way  20   10
2    88 Dark Drive         
3  33 Planet Place  20   40
4     22 Milky Way  20   10
5     22 Milky Way  20   10

Using something like a dictionary would be infeasible since the DataFrame contains thousands of different addresses.使用字典之类的东西是不可行的,因为 DataFrame 包含数千个不同的地址。

EDIT: It's safe to assume that either both values are missing or both are present.编辑:可以安全地假设两个值都缺失或两者都存在。 In other words, there will never be a row with only val and not val2 or vice-versa.换句话说,永远不会有一行只有 val 而不是 val2,反之亦然。 However, an answer that could take that possible circumstance into account would be even better!但是,可以将这种可能情况考虑在内的答案会更好!

number of ways you can do this, the most easiest is groupby and ffill / bfill the groups.您可以通过多种方式执行此操作,最简单的是 groupby 和 ffill / bfill 组。

import numpy as np
import pandas as pd
df = df.replace('',np.nan,regex=True).groupby('address').apply(lambda x : x.ffill().bfill())

print(df)

           address   val  val2
0   11 Star Street  10.0  20.0
1     22 Milky Way  20.0  10.0
2    88 Dark Drive   NaN   NaN
3  33 Planet Place  20.0  40.0
4     22 Milky Way  20.0  10.0
5     22 Milky Way  20.0  10.0

Another, and more performant method would be using update along your axis.另一种更高效的方法是沿轴使用update

vals = df.replace('',np.nan,regex=True).groupby('address').first()

print(vals)
    
                     val  val2
    address                    
    11 Star Street   10.0  20.0
    22 Milky Way     20.0  10.0
    33 Planet Place  20.0  40.0
    88 Dark Drive     NaN   NaN

df = df.set_index('address')

df.update(vals)

                val val2
address                 
11 Star Street   10   20
22 Milky Way     20   10
88 Dark Drive           
33 Planet Place  20   40
22 Milky Way     20   10
22 Milky Way     20   10

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查找与另一个数据帧中的列具有相同非唯一列值的数据帧行 - Find rows of a dataframe that have same non-unique column values as a column in another dataframe 根据其他列的唯一值从数据框中选择行? - Select rows from dataframe based on a unique values of other column? 用“目标行”中其他行中的值替换数据框中的值 - Replacing values in dataframe by values from other rows by “target row” 仅基于列中的非 NaN 值在 dataframe 中创建新行 - Create a new row in a dataframe based only for non NaN values from a column Pandas Dataframe:将唯一行标签转换为非唯一行 - Pandas Dataframe: convert unique row label into non-unique rows 我在选定的列中有具有非唯一值的 pd.DataFrame。 我怎样才能只留下具有所选列的唯一值的行? - I have pd.DataFrame with non-unique values in selected Column. How can i leave only rows with unique values ​of the selected column? 如何使用非唯一列将具有求和值的熊猫Groupby数据框映射到另一个数据框 - How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column 根据特定列值组合 Dataframe 上的行并添加其他值 - Combining rows on a Dataframe based on a specific column value and add other values 使用其他行中的非唯一值从 Dataframe 行中提取值 - extracting values from a Dataframe rows using non unique values in other rows 通过其他键将列添加到具有非唯一 ID 的 pyspark 数据框 - Add column to pyspark dataframe with non-unique ids by other key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM