简体   繁体   中英

How to use diff() function to identify salary changes in Pandas for HR analytics?

Given an HR employee dataset with Grades and Salaries, I wish to identify if there were changes to both Grade and Salary for each employee.

I was able to do it using .diff() pandas function but when the second employee comes in, it takes the last employee data and that is not what I expect. I wish to use the .diff() function or another way for each employee.

Here is the code used so far.

import pandas as pd

# This is my Dataset
hr = pd.DataFrame({'Employee': ['100201', '100201', '100201', 
'100201', '100201', '100201','100299', '100299'],
                   'Month/Year': ['01.2018', '02.2018', '03.2018', 
'04.2018', '05.2018', '06.2018','01.2019', '02.2019'],
                   'Salary': [12175, 13000, 13000, 13125, 14000, 
14000, 20000, 21000],
                   'Grade': [1, 1, 2, 2, 2, 1, 3, 4],
                   'Position': [1, 1, 2, 2, 2, 2, 3, 4]})

hr

# This is how I check the diff from each month:
hr.set_index('Employee')
hr['Increase'] = hr['Salary'].diff(1)
hr['Grade Change'] = hr['Grade'].diff(1)
hr

# Finally just apply a lambda function
hr['Promotion'] = hr['Increase'].apply(lambda x: x > 0 )
hr['Grade Increase'] = hr['Grade Change'].apply(lambda x: x != 0 )
hr

As you can see in the result:

Result

I was able to understand all the Grade and Salary changes for employee 100201. However, for employee 100299, the code is taking the salary of 14000 from index 5 from employee 100299, hence mentioning that there is a 6000 salary change. When on fact, employee 100299 only joined in 01.2019 and started with a salary of 20000. In 02.2019 the salary change is correct.

What I really expect is to do a sort of a break whenever there is a new employee in the dataset.

I am new to Python and pandas so this will help a lot. Thanks in advance!

Use DataFrame.groupby with groupby 'Employee' :

hr[['Salary_increase', 'Grade_change']] = hr.groupby('Employee')[['Salary', 'Grade']].diff()
hr[['Promotion', 'Grade_increase']] =  hr[['Salary', 'Grade']].diff().gt(0)

[out]

  Employee Month/Year  Salary  Grade  Position  Salary_increase  Grade_change  \
0   100201    01.2018   12175      1         1              NaN           NaN   
1   100201    02.2018   13000      1         1            825.0           0.0   
2   100201    03.2018   13000      2         2              0.0           1.0   
3   100201    04.2018   13125      2         2            125.0           0.0   
4   100201    05.2018   14000      2         2            875.0           0.0   
5   100201    06.2018   14000      1         2              0.0          -1.0   
6   100299    01.2019   20000      3         3              NaN           NaN   
7   100299    02.2019   21000      4         4           1000.0           1.0   

   Promotion  Grade_increase  
0      False           False  
1       True           False  
2      False            True  
3       True           False  
4       True           False  
5      False           False  
6       True            True  
7       True            True  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM