简体   繁体   中英

Python Pandas counting the occurrences of an event in each year

I've got a dataframe describing events in a company and it looks like this:

employee_id    event            event_start_date    event_end_date    hire_date
1              "data change"    1.01.2018           1.01.2018         1.09.2005
2              "data change"    4.04.2018           4.04.2018         1.06.2007
2              "termination"    2.10.2020           NaT               1.06.2007
3              "hire"           23.05.2019          23.05.2019        23.05.2019
3              "leave"          23.07.2019          30.07.2019        23.05.2019
3              "termination"    3.11.2020           NaT               23.05.2019

Table is indexed by employee_id and event, and sorted by event_start_date.

So one employee has one or more events listed in the table. "Hired" event is not always in the "event" column, so I assume that information about hiring date is only available in "hire_date" column. I would like to:

  1. count the number of hiring events in each year
  2. count the number of termination events in each year
  3. Count the number of active employees in each year

Build the example df:

import pandas as pd
import datetime
import numpy as np

# example df

emp = [1, 2, 2, 3, 3, 3]
event = ["data change", "data change", "termination", "hire", "leave", "termination"]
s_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), datetime.datetime(2020, 10, 2),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 23), datetime.datetime(2020, 11, 3)]
e_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), np.datetime64('NaT'),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 30), np.datetime64('NaT')]
h_date = [datetime.datetime(2005, 9, 1), datetime.datetime(2007, 6, 1), datetime.datetime(2017, 6, 1),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23)]

df = pd.DataFrame(emp, columns=['employee_id'])
df['event'] = event
df['event_start_date'] = s_date
df['event_end_date'] = e_date
df['hire_date'] = h_date

1st question

def calculate_hire_for_year():
    df['hire_year'] = pd.DatetimeIndex(df['hire_date']).year
    dict_years = {}
    ids = set(list(df['employee_id']))
    for id in ids:
        result = df[df['employee_id'] == id]
        year = list(result['hire_year'])[0]
        dict_years[year] = dict_years.get("b", 0) + 1
    return dict_years



print("Number of hiring events in each year:")
print(calculate_hire_for_year())

2nd question

def calculate_termination_per_year():
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    result = df[df['event'] == "termination"]
    count_series = result.groupby(["event", "year"]).size()
    return count_series



print("Number of termination events in each year:")
print(calculate_termination_per_year())

3rd question

def calculate_employee_per_year():
    dict_years = {}
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    years = set(list(df['year']))
    for year in years:
        result = df[df['year'] == year]
        count_emp = len(set(list(result['employee_id'])))
        dict_years[year] = count_emp
    return dict_years


print("Number of active employees in each year:")
print(calculate_employee_per_year())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM