簡體   English   中英

Python Pandas 計算每年發生的事件

[英]Python Pandas counting the occurrences of an event in each year

我有一個描述公司事件的 dataframe,它看起來像這樣:

employee_id    event            event_start_date    event_end_date    hire_date
1              "data change"    1.01.2018           1.01.2018         1.09.2005
2              "data change"    4.04.2018           4.04.2018         1.06.2007
2              "termination"    2.10.2020           NaT               1.06.2007
3              "hire"           23.05.2019          23.05.2019        23.05.2019
3              "leave"          23.07.2019          30.07.2019        23.05.2019
3              "termination"    3.11.2020           NaT               23.05.2019

表按employee_id 和event 索引,按event_start_date 排序。

因此,一名員工在表中列出了一個或多個事件。 “雇用”事件並不總是在“事件”列中,因此我假設有關雇用日期的信息僅在“雇用日期”列中可用。 我想:

  1. 計算每年的招聘活動數量
  2. 計算每年終止事件的數量
  3. 統計每年的在職員工人數

構建示例 df:

import pandas as pd
import datetime
import numpy as np

# example df

emp = [1, 2, 2, 3, 3, 3]
event = ["data change", "data change", "termination", "hire", "leave", "termination"]
s_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), datetime.datetime(2020, 10, 2),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 23), datetime.datetime(2020, 11, 3)]
e_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), np.datetime64('NaT'),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 30), np.datetime64('NaT')]
h_date = [datetime.datetime(2005, 9, 1), datetime.datetime(2007, 6, 1), datetime.datetime(2017, 6, 1),
          datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23)]

df = pd.DataFrame(emp, columns=['employee_id'])
df['event'] = event
df['event_start_date'] = s_date
df['event_end_date'] = e_date
df['hire_date'] = h_date

第一個問題

def calculate_hire_for_year():
    df['hire_year'] = pd.DatetimeIndex(df['hire_date']).year
    dict_years = {}
    ids = set(list(df['employee_id']))
    for id in ids:
        result = df[df['employee_id'] == id]
        year = list(result['hire_year'])[0]
        dict_years[year] = dict_years.get("b", 0) + 1
    return dict_years



print("Number of hiring events in each year:")
print(calculate_hire_for_year())

第二個問題

def calculate_termination_per_year():
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    result = df[df['event'] == "termination"]
    count_series = result.groupby(["event", "year"]).size()
    return count_series



print("Number of termination events in each year:")
print(calculate_termination_per_year())

第三個問題

def calculate_employee_per_year():
    dict_years = {}
    df['year'] = pd.DatetimeIndex(df['event_start_date']).year
    years = set(list(df['year']))
    for year in years:
        result = df[df['year'] == year]
        count_emp = len(set(list(result['employee_id'])))
        dict_years[year] = count_emp
    return dict_years


print("Number of active employees in each year:")
print(calculate_employee_per_year())

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM