[英]Fill nulls in columns with non-null values from other columns
Given a dataframe with similar columns having null values in between.给定一个 dataframe 和类似的列,它们之间有 null 个值。 How to dynamically fill nulls in the columns with non-null values from other columns without explicitly stating the names of other column names eg select first column category1
and fill the null rows with values from other columns of same rows?如何使用其他列的非空值动态填充列中的空值而不明确说明其他列名称的名称,例如 select 第一列category1
1 并使用同一行其他列的值填充 null 行?
data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56],
'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56],
'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],}
df = pd.DataFrame(data)
df = df.set_index('year')
df
category1 category2 category3
year
2010 NaN 10 10
2011 21 21 21
2012 NaN 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
After filling category1
:填写category1
后:
category1 category2 category3
year
2010 10 10 10
2011 21 21 21
2012 20 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
IIUC you can do it this way: IIUC 你可以这样做:
In [369]: df['category1'] = df['category1'].fillna(df['category2'])
In [370]: df
Out[370]:
category1 category2 category3
year
2010 10.0 10.0 10.0
2011 21.0 21.0 21.0
2012 20.0 20.0 20.0
2013 10.0 10.0 10.0
2014 NaN NaN NaN
2015 30.0 30.0 30.0
2016 31.0 NaN 31.0
2017 45.0 45.0 45.0
2018 23.0 23.0 23.0
2019 56.0 56.0 56.0
You can use first_valid_index
with condition if all values are NaN
:如果所有值都是NaN
您可以使用first_valid_index
和条件:
def f(x):
if x.first_valid_index() is None:
return None
else:
return x[x.first_valid_index()]
df['a'] = df.apply(f, axis=1)
print (df)
category1 category2 category3 a
year
2010 NaN 10.0 10.0 10.0
2011 21.0 21.0 21.0 21.0
2012 NaN 20.0 20.0 20.0
2013 10.0 10.0 10.0 10.0
2014 NaN NaN NaN NaN
2015 30.0 30.0 30.0 30.0
2016 31.0 NaN 31.0 31.0
2017 45.0 45.0 45.0 45.0
2018 23.0 23.0 23.0 23.0
2019 56.0 56.0 56.0 56.0
试试这个:
df['category1']= df['category1'].fillna(df.median(axis=1))
你可以用pandas.DataFrame.fillna查看文档,很清楚
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.