簡體   English   中英

python - 如何根據某些條件部分填充缺失值python pandas?

[英]How to partial fill in missing value based on some condition python pandas?

我的數據集有以下缺失值:

 print(train.shape)
 (54808, 6)

employee_id                0
name                       0
education               2409
age                        0
Salary_hike             4124
length_of_service          0

如果小於 1,我想根據 length_of_service 將缺失的salary_hike 行值填充為 0。

例子:

train = pd.DataFrame({'employee_id':[103,101,103,104,105,106,107,108,109,110],
                      'Name':['A','B','C','D','E','F','G','H','I','J'],
                      'Age' :[20,30,21,24,25,22,27,23,24,21],
                     'length_of_service':[1,2,1,4,5,1,7,1,2,1], 
                      'Salary_hike':[np.nan,5, np.nan, 6, 7,1,9,1,4,np.nan]                ,
                                                                            })

因為我已經確定了服務時間少於一的行有多少?

(train['length_of_service']<= 1).sum()
5

接下來,我使用以下兩種條件對我的數據框進行了圓角處理

train[(train.length_of_service <=1) & (train['Salary_hike'].isnull())]

        employee_id     Name    Age     length_of_service   Salary_hike
0   103     A   20  1   NaN
2   103     C   21  1   NaN
9   110     J   21  1   NaN

現在如何將上述過濾列表中缺失的加薪值填充為 0?

    employee_id     Name    Age     length_of_service   Salary_hike
0   103     A   20  1   0
2   103     C   21  1   0
9   110     J   21  1   0

我使用了評論部分中提到的命令,例如:

train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0

但我仍然得到 3 的缺失值。

train.isnull().sum()

大家好,

感謝您的寶貴意見:

現在它在使用以下命令后工作:

train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),['Salary_hike']]=0

我相信你需要DataFrame.loc

train = pd.DataFrame({'length_of_service':[-1,5,4,-8,9,-3,0], 
                      'Salary_hike':[10,np.nan, 5, np.nan, np.nan, 8, np.nan]})
train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0

print (train)
   length_of_service  Salary_hike
0                 -1         10.0
1                  5          NaN
2                  4          5.0
3                 -8          0.0
4                  9          NaN
5                 -3          8.0
6                  0          0.0

如果需要設置值是否為-1

train = pd.DataFrame({'length_of_service':[-1,5,4,-1,9,-3,-1], 
                      'Salary_hike':[10,np.nan, 5, np.nan, np.nan, 8, np.nan]})
train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0

print (train)
   length_of_service  Salary_hike
0                 -1         10.0
1                  5          NaN
2                  4          5.0
3                 -1          0.0
4                  9          NaN
5                 -3          8.0
6                 -1          0.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM