简体   繁体   English

计算时间序列的每日频率

[英]Compute daily frequency on a time series

Task:任务:

Calculate the frequency of each ID for each month of 2021计算2021年每个月每个ID出现的频率

Frequency formula: Activity period (Length of time between last activity and first activity) / (Number of activity Days - 1)频率公式:活动周期(上次活动和第一次活动之间的时间长度)/(活动天数 - 1)

eg ID 1 - Month 2: Activity Period (2021-02-23 - 2021-02-18 = 5 days) / (3 active days - 1) == Frequency = 2,5例如 ID 1 - 第 2 个月:活动期(2021-02-23 - 2021-02-18 = 5 天)/(3 活动天 - 1)== 频率 = 2,5

Sample:样本:

times = [
    '2021-02-18',
    '2021-02-22',
    '2021-02-23',
    '2021-04-23',
    '2021-01-18',
    '2021-01-19',
    '2021-01-20',
    '2021-01-03',
    '2021-02-04',
    '2021-02-04'
] 

id = [1, 1, 1, 1, 44, 44, 44, 46, 46, 46]

df = pd.DataFrame({'ID':id, 'Date': pd.to_datetime(times)})

df = df.reset_index(drop=True)

print(df)

       ID       Date
0   1 2021-02-18
1   1 2021-02-22
2   1 2021-02-23
3   1 2021-04-23
4  44 2021-01-18
5  44 2021-01-19
6  44 2021-01-20
7  46 2021-01-03
8  46 2021-02-04
9  46 2021-02-04

Desired Output:所需的 Output:

If frequency negative == 0如果频率为负 == 0

  id  01_2021  02_2021  03_2021  04_2021
0   1        0        2        0        0
1  44        1        0        0        0
2  46        0        0        0        0 

Try a pivot_table with a custom aggfunc:尝试使用自定义aggfunc的 pivot_table:

# Create Columns For Later
dr = pd.date_range(start=df['Date'].min(),
                   end=df['Date'].max() + pd.offsets.MonthBegin(1), freq='M') \
    .map(lambda dt: dt.strftime('%m_%Y'))

new_df = (
    df.pivot_table(
        index='ID',
        # Columns are dates in MM_YYYY format
        columns=df['Date'].dt.strftime('%m_%Y'),
        # Custom Agg Function
        aggfunc=lambda x: (x.max() - x.min()) /
                          pd.offsets.Day(max(1, len(x) - 1))
        # max(1, len(x) -1) to prevent divide by 0
    )
        # Fix Axis Names and Column Levels
        .droplevel(0, axis=1)
        .rename_axis(None, axis=1)
        # Reindex  to include every month from min to max date
        .reindex(dr, axis=1)
        # Clip to exclude negatives
        .clip(lower=0)
        # Fillna with 0
        .fillna(0)
        # Reset index
        .reset_index()
)

print(new_df)

new_df : new_df

   ID  01_2021  02_2021  03_2021  04_2021
0   1      0.0      2.5      0.0      0.0
1  44      1.0      0.0      0.0      0.0
2  46      0.0      0.0      0.0      0.0

You will need to pivot the table, but first if you want only the month and year of the date, you need to transform it.您将需要 pivot 表,但首先如果您只需要日期的月份和年份,则需要对其进行转换。

df['Date'] = df.Date.map(lambda s: "{}_{}".format(s.year,s.month))
df['counts'] = 1
df_new = pd.pivot_table(df, index=['ID'],
                        columns=['Date'], aggfunc=np.sum)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM