简体   繁体   English

使用来自其他列的非空值填充列中的空值

[英]Fill nulls in columns with non-null values from other columns

Given a dataframe with similar columns having null values in between.给定一个 dataframe 和类似的列,它们之间有 null 个值。 How to dynamically fill nulls in the columns with non-null values from other columns without explicitly stating the names of other column names eg select first column category1 and fill the null rows with values from other columns of same rows?如何使用其他列的非空值动态填充列中的空值而不明确说明其他列名称的名称,例如 select 第一列category1 1 并使用同一行其他列的值填充 null 行?

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
        'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56],
        'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56],
        'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],}


df = pd.DataFrame(data)
df = df.set_index('year')
df

    category1   category2   category3
year            
2010    NaN 10  10
2011    21  21  21
2012    NaN 20  20
2013    10  10  10
2014    NaN NaN NaN
2015    30  30  NaN
2016    31  NaN 31
2017    45  45  45
2018    23  23  23
2019    56  56  56

After filling category1 :填写category1后:

category1   category2   category3
year            
2010    10  10  10
2011    21  21  21
2012    20  20  20
2013    10  10  10
2014    NaN NaN NaN
2015    30  30  NaN
2016    31  NaN 31
2017    45  45  45
2018    23  23  23
2019    56  56  56

IIUC you can do it this way: IIUC 你可以这样做:

In [369]: df['category1'] = df['category1'].fillna(df['category2'])

In [370]: df
Out[370]:
      category1  category2  category3
year
2010       10.0       10.0       10.0
2011       21.0       21.0       21.0
2012       20.0       20.0       20.0
2013       10.0       10.0       10.0
2014        NaN        NaN        NaN
2015       30.0       30.0       30.0
2016       31.0        NaN       31.0
2017       45.0       45.0       45.0
2018       23.0       23.0       23.0
2019       56.0       56.0       56.0

You can use first_valid_index with condition if all values are NaN :如果所有值都是NaN您可以使用first_valid_index和条件:

def f(x):
    if x.first_valid_index() is None:
        return None
    else:
        return x[x.first_valid_index()]

df['a'] = df.apply(f, axis=1)

print (df)
      category1  category2  category3     a
year                                       
2010        NaN       10.0       10.0  10.0
2011       21.0       21.0       21.0  21.0
2012        NaN       20.0       20.0  20.0
2013       10.0       10.0       10.0  10.0
2014        NaN        NaN        NaN   NaN
2015       30.0       30.0       30.0  30.0
2016       31.0        NaN       31.0  31.0
2017       45.0       45.0       45.0  45.0
2018       23.0       23.0       23.0  23.0
2019       56.0       56.0       56.0  56.0

试试这个:

df['category1']= df['category1'].fillna(df.median(axis=1))

你可以用pandas.DataFrame.fillna查看文档,很清楚

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从PySpark DataFrame中的非空列中选择值 - Selecting values from non-null columns in a PySpark DataFrame 如何将 Pandas Dataframe 中某些列的非空值填充到新列中? 如何在多个条件下使用 np.where()? - How to fill Non-Null values from some columns in Pandas Dataframe into a new column? How to use np.where() for multiple conditions? 如何计算具有非空值的列和行之间的交点 - How to count intersections between columns with non-null values and row 将 dataframe 的多列与非空值的分隔符连接起来 - Concatenate multiple columns of dataframe with a seperating character for Non-null values 从DataFrame中的特定列中选择非空行,并对其他列进行子选择 - Select non-null rows from a specific column in a DataFrame and take a sub-selection of other columns 从熊猫数据框中的多个列创建一个包含所有非空值的单个列 - create a single column containing all non-null values from multiple columns in a pandas dataframe 根据非空列数从数据框中选择行 - Select rows from a dataframe based on the number of non-null columns 将 pandas 上的列从非空值 object 转换为浮点数 - Convert columns on pandas from non-null object to float fill_null() 值与其他列数据 - fill_null() values with other columns data 从其他两列的值填充列 - fill column from values of two other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM