I've got a dataframe describing events in a company and it looks like this:
employee_id event event_start_date event_end_date hire_date
1 "data change" 1.01.2018 1.01.2018 1.09.2005
2 "data change" 4.04.2018 4.04.2018 1.06.2007
2 "termination" 2.10.2020 NaT 1.06.2007
3 "hire" 23.05.2019 23.05.2019 23.05.2019
3 "leave" 23.07.2019 30.07.2019 23.05.2019
3 "termination" 3.11.2020 NaT 23.05.2019
Table is indexed by employee_id and event, and sorted by event_start_date.
So one employee has one or more events listed in the table. "Hired" event is not always in the "event" column, so I assume that information about hiring date is only available in "hire_date" column. I would like to:
Build the example df:
import pandas as pd
import datetime
import numpy as np
# example df
emp = [1, 2, 2, 3, 3, 3]
event = ["data change", "data change", "termination", "hire", "leave", "termination"]
s_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), datetime.datetime(2020, 10, 2),
datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 23), datetime.datetime(2020, 11, 3)]
e_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), np.datetime64('NaT'),
datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 30), np.datetime64('NaT')]
h_date = [datetime.datetime(2005, 9, 1), datetime.datetime(2007, 6, 1), datetime.datetime(2017, 6, 1),
datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23)]
df = pd.DataFrame(emp, columns=['employee_id'])
df['event'] = event
df['event_start_date'] = s_date
df['event_end_date'] = e_date
df['hire_date'] = h_date
1st question
def calculate_hire_for_year():
df['hire_year'] = pd.DatetimeIndex(df['hire_date']).year
dict_years = {}
ids = set(list(df['employee_id']))
for id in ids:
result = df[df['employee_id'] == id]
year = list(result['hire_year'])[0]
dict_years[year] = dict_years.get("b", 0) + 1
return dict_years
print("Number of hiring events in each year:")
print(calculate_hire_for_year())
2nd question
def calculate_termination_per_year():
df['year'] = pd.DatetimeIndex(df['event_start_date']).year
result = df[df['event'] == "termination"]
count_series = result.groupby(["event", "year"]).size()
return count_series
print("Number of termination events in each year:")
print(calculate_termination_per_year())
3rd question
def calculate_employee_per_year():
dict_years = {}
df['year'] = pd.DatetimeIndex(df['event_start_date']).year
years = set(list(df['year']))
for year in years:
result = df[df['year'] == year]
count_emp = len(set(list(result['employee_id'])))
dict_years[year] = count_emp
return dict_years
print("Number of active employees in each year:")
print(calculate_employee_per_year())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.