[英]Python pandas fill missing value (NaN) based on condition of another column
[英]How to partial fill in missing value based on some condition python pandas?
我的數據集有以下缺失值:
print(train.shape)
(54808, 6)
employee_id 0
name 0
education 2409
age 0
Salary_hike 4124
length_of_service 0
如果小於 1,我想根據 length_of_service 將缺失的salary_hike 行值填充為 0。
例子:
train = pd.DataFrame({'employee_id':[103,101,103,104,105,106,107,108,109,110],
'Name':['A','B','C','D','E','F','G','H','I','J'],
'Age' :[20,30,21,24,25,22,27,23,24,21],
'length_of_service':[1,2,1,4,5,1,7,1,2,1],
'Salary_hike':[np.nan,5, np.nan, 6, 7,1,9,1,4,np.nan] ,
})
因為我已經確定了服務時間少於一的行有多少?
(train['length_of_service']<= 1).sum()
5
接下來,我使用以下兩種條件對我的數據框進行了圓角處理
train[(train.length_of_service <=1) & (train['Salary_hike'].isnull())]
employee_id Name Age length_of_service Salary_hike
0 103 A 20 1 NaN
2 103 C 21 1 NaN
9 110 J 21 1 NaN
現在如何將上述過濾列表中缺失的加薪值填充為 0?
employee_id Name Age length_of_service Salary_hike
0 103 A 20 1 0
2 103 C 21 1 0
9 110 J 21 1 0
我使用了評論部分中提到的命令,例如:
train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0
但我仍然得到 3 的缺失值。
train.isnull().sum()
大家好,
感謝您的寶貴意見:
現在它在使用以下命令后工作:
train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),['Salary_hike']]=0
我相信你需要DataFrame.loc
:
train = pd.DataFrame({'length_of_service':[-1,5,4,-8,9,-3,0],
'Salary_hike':[10,np.nan, 5, np.nan, np.nan, 8, np.nan]})
train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0
print (train)
length_of_service Salary_hike
0 -1 10.0
1 5 NaN
2 4 5.0
3 -8 0.0
4 9 NaN
5 -3 8.0
6 0 0.0
如果需要設置值是否為-1
:
train = pd.DataFrame({'length_of_service':[-1,5,4,-1,9,-3,-1],
'Salary_hike':[10,np.nan, 5, np.nan, np.nan, 8, np.nan]})
train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0
print (train)
length_of_service Salary_hike
0 -1 10.0
1 5 NaN
2 4 5.0
3 -1 0.0
4 9 NaN
5 -3 8.0
6 -1 0.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.