[英]Fill nulls in columns with non-null values from other columns
給定一個 dataframe 和類似的列,它們之間有 null 個值。 如何使用其他列的非空值動態填充列中的空值而不明確說明其他列名稱的名稱,例如 select 第一列category1
1 並使用同一行其他列的值填充 null 行?
data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56],
'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56],
'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],}
df = pd.DataFrame(data)
df = df.set_index('year')
df
category1 category2 category3
year
2010 NaN 10 10
2011 21 21 21
2012 NaN 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
填寫category1
后:
category1 category2 category3
year
2010 10 10 10
2011 21 21 21
2012 20 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
IIUC 你可以這樣做:
In [369]: df['category1'] = df['category1'].fillna(df['category2'])
In [370]: df
Out[370]:
category1 category2 category3
year
2010 10.0 10.0 10.0
2011 21.0 21.0 21.0
2012 20.0 20.0 20.0
2013 10.0 10.0 10.0
2014 NaN NaN NaN
2015 30.0 30.0 30.0
2016 31.0 NaN 31.0
2017 45.0 45.0 45.0
2018 23.0 23.0 23.0
2019 56.0 56.0 56.0
如果所有值都是NaN
您可以使用first_valid_index
和條件:
def f(x):
if x.first_valid_index() is None:
return None
else:
return x[x.first_valid_index()]
df['a'] = df.apply(f, axis=1)
print (df)
category1 category2 category3 a
year
2010 NaN 10.0 10.0 10.0
2011 21.0 21.0 21.0 21.0
2012 NaN 20.0 20.0 20.0
2013 10.0 10.0 10.0 10.0
2014 NaN NaN NaN NaN
2015 30.0 30.0 30.0 30.0
2016 31.0 NaN 31.0 31.0
2017 45.0 45.0 45.0 45.0
2018 23.0 23.0 23.0 23.0
2019 56.0 56.0 56.0 56.0
試試這個:
df['category1']= df['category1'].fillna(df.median(axis=1))
你可以用pandas.DataFrame.fillna查看文檔,很清楚
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.