繁体   English   中英

Python Pandas 计算每年发生的事件

[英]Python Pandas counting the occurrences of an event in each year

我有一个描述公司事件的 dataframe,它看起来像这样:

employee_id    event            event_start_date    event_end_date    hire_date
1              "data change"    1.01.2018           1.01.2018         1.09.2005
2              "data change"    4.04.2018           4.04.2018         1.06.2007
2              "termination"    2.10.2020           NaT               1.06.2007
3              "hire"           23.05.2019          23.05.2019        23.05.2019
3              "leave"          23.07.2019          30.07.2019        23.05.2019
3              "termination"    3.11.2020           NaT               23.05.2019

表按employee_id 和event 索引,按event_start_date 排序。

因此,一名员工在表中列出了一个或多个事件。 “雇用”事件并不总是在“事件”列中,因此我假设有关雇用日期的信息仅在“雇用日期”列中可用。 我想:

  1. 计算每年的招聘活动数量
  2. 计算每年终止事件的数量
  3. 统计每年的在职员工人数

构建示例 df:

import pandas as pd
import datetime
import numpy as np

# example df

emp = [1, 2, 2, 3, 3, 3]
event = ["data change", "data change", "termination", "hire", "leave", "termination"]
s_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), datetime.datetime(2020, 10, 2),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 23), datetime.datetime(2020, 11, 3)]
e_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), np.datetime64('NaT'),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 30), np.datetime64('NaT')]
h_date = [datetime.datetime(2005, 9, 1), datetime.datetime(2007, 6, 1), datetime.datetime(2017, 6, 1),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23)]

df = pd.DataFrame(emp, columns=['employee_id'])
df['event'] = event
df['event_start_date'] = s_date
df['event_end_date'] = e_date
df['hire_date'] = h_date

第一个问题

def calculate_hire_for_year():
    df['hire_year'] = pd.DatetimeIndex(df['hire_date']).year
    dict_years = {}
    ids = set(list(df['employee_id']))
    for id in ids:
        result = df[df['employee_id'] == id]
        year = list(result['hire_year'])[0]
        dict_years[year] = dict_years.get("b", 0) + 1
    return dict_years



print("Number of hiring events in each year:")
print(calculate_hire_for_year())

第二个问题

def calculate_termination_per_year():
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    result = df[df['event'] == "termination"]
    count_series = result.groupby(["event", "year"]).size()
    return count_series



print("Number of termination events in each year:")
print(calculate_termination_per_year())

第三个问题

def calculate_employee_per_year():
    dict_years = {}
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    years = set(list(df['year']))
    for year in years:
        result = df[df['year'] == year]
        count_emp = len(set(list(result['employee_id'])))
        dict_years[year] = count_emp
    return dict_years


print("Number of active employees in each year:")
print(calculate_employee_per_year())

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM