简体   繁体   English

Python Pandas 计算每年发生的事件

[英]Python Pandas counting the occurrences of an event in each year

I've got a dataframe describing events in a company and it looks like this:我有一个描述公司事件的 dataframe,它看起来像这样:

employee_id    event            event_start_date    event_end_date    hire_date
1              "data change"    1.01.2018           1.01.2018         1.09.2005
2              "data change"    4.04.2018           4.04.2018         1.06.2007
2              "termination"    2.10.2020           NaT               1.06.2007
3              "hire"           23.05.2019          23.05.2019        23.05.2019
3              "leave"          23.07.2019          30.07.2019        23.05.2019
3              "termination"    3.11.2020           NaT               23.05.2019

Table is indexed by employee_id and event, and sorted by event_start_date.表按employee_id 和event 索引,按event_start_date 排序。

So one employee has one or more events listed in the table.因此,一名员工在表中列出了一个或多个事件。 "Hired" event is not always in the "event" column, so I assume that information about hiring date is only available in "hire_date" column. “雇用”事件并不总是在“事件”列中,因此我假设有关雇用日期的信息仅在“雇用日期”列中可用。 I would like to:我想:

  1. count the number of hiring events in each year计算每年的招聘活动数量
  2. count the number of termination events in each year计算每年终止事件的数量
  3. Count the number of active employees in each year统计每年的在职员工人数

Build the example df:构建示例 df:

import pandas as pd
import datetime
import numpy as np

# example df

emp = [1, 2, 2, 3, 3, 3]
event = ["data change", "data change", "termination", "hire", "leave", "termination"]
s_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), datetime.datetime(2020, 10, 2),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 23), datetime.datetime(2020, 11, 3)]
e_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), np.datetime64('NaT'),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 30), np.datetime64('NaT')]
h_date = [datetime.datetime(2005, 9, 1), datetime.datetime(2007, 6, 1), datetime.datetime(2017, 6, 1),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23)]

df = pd.DataFrame(emp, columns=['employee_id'])
df['event'] = event
df['event_start_date'] = s_date
df['event_end_date'] = e_date
df['hire_date'] = h_date

1st question第一个问题

def calculate_hire_for_year():
    df['hire_year'] = pd.DatetimeIndex(df['hire_date']).year
    dict_years = {}
    ids = set(list(df['employee_id']))
    for id in ids:
        result = df[df['employee_id'] == id]
        year = list(result['hire_year'])[0]
        dict_years[year] = dict_years.get("b", 0) + 1
    return dict_years



print("Number of hiring events in each year:")
print(calculate_hire_for_year())

2nd question第二个问题

def calculate_termination_per_year():
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    result = df[df['event'] == "termination"]
    count_series = result.groupby(["event", "year"]).size()
    return count_series



print("Number of termination events in each year:")
print(calculate_termination_per_year())

3rd question第三个问题

def calculate_employee_per_year():
    dict_years = {}
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    years = set(list(df['year']))
    for year in years:
        result = df[df['year'] == year]
        count_emp = len(set(list(result['employee_id'])))
        dict_years[year] = count_emp
    return dict_years


print("Number of active employees in each year:")
print(calculate_employee_per_year())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM