[英]Compute daily frequency on a time series
Task:任务:
Calculate the frequency of each ID for each month of 2021计算2021年每个月每个ID出现的频率
Frequency formula: Activity period (Length of time between last activity and first activity) / (Number of activity Days - 1)频率公式:活动周期(上次活动和第一次活动之间的时间长度)/(活动天数 - 1)
eg ID 1 - Month 2: Activity Period (2021-02-23 - 2021-02-18 = 5 days) / (3 active days - 1) == Frequency = 2,5例如 ID 1 - 第 2 个月:活动期(2021-02-23 - 2021-02-18 = 5 天)/(3 活动天 - 1)== 频率 = 2,5
Sample:样本:
times = [
'2021-02-18',
'2021-02-22',
'2021-02-23',
'2021-04-23',
'2021-01-18',
'2021-01-19',
'2021-01-20',
'2021-01-03',
'2021-02-04',
'2021-02-04'
]
id = [1, 1, 1, 1, 44, 44, 44, 46, 46, 46]
df = pd.DataFrame({'ID':id, 'Date': pd.to_datetime(times)})
df = df.reset_index(drop=True)
print(df)
ID Date
0 1 2021-02-18
1 1 2021-02-22
2 1 2021-02-23
3 1 2021-04-23
4 44 2021-01-18
5 44 2021-01-19
6 44 2021-01-20
7 46 2021-01-03
8 46 2021-02-04
9 46 2021-02-04
Desired Output:所需的 Output:
If frequency negative == 0如果频率为负 == 0
id 01_2021 02_2021 03_2021 04_2021
0 1 0 2 0 0
1 44 1 0 0 0
2 46 0 0 0 0
Try a pivot_table with a custom aggfunc:尝试使用自定义aggfunc的 pivot_table:
# Create Columns For Later
dr = pd.date_range(start=df['Date'].min(),
end=df['Date'].max() + pd.offsets.MonthBegin(1), freq='M') \
.map(lambda dt: dt.strftime('%m_%Y'))
new_df = (
df.pivot_table(
index='ID',
# Columns are dates in MM_YYYY format
columns=df['Date'].dt.strftime('%m_%Y'),
# Custom Agg Function
aggfunc=lambda x: (x.max() - x.min()) /
pd.offsets.Day(max(1, len(x) - 1))
# max(1, len(x) -1) to prevent divide by 0
)
# Fix Axis Names and Column Levels
.droplevel(0, axis=1)
.rename_axis(None, axis=1)
# Reindex to include every month from min to max date
.reindex(dr, axis=1)
# Clip to exclude negatives
.clip(lower=0)
# Fillna with 0
.fillna(0)
# Reset index
.reset_index()
)
print(new_df)
new_df
: new_df
:
ID 01_2021 02_2021 03_2021 04_2021
0 1 0.0 2.5 0.0 0.0
1 44 1.0 0.0 0.0 0.0
2 46 0.0 0.0 0.0 0.0
You will need to pivot the table, but first if you want only the month and year of the date, you need to transform it.您将需要 pivot 表,但首先如果您只需要日期的月份和年份,则需要对其进行转换。
df['Date'] = df.Date.map(lambda s: "{}_{}".format(s.year,s.month))
df['counts'] = 1
df_new = pd.pivot_table(df, index=['ID'],
columns=['Date'], aggfunc=np.sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.