[英]Pandas count event per day from join date
I have this data frame:我有这个数据框:
name event join_date created_at
A X 2020-12-01 2020-12-01
A X 2020-12-01 2020-12-01
A X 2020-12-01 2020-12-02
A Y 2020-12-01 2020-12-02
B X 2020-12-05 2020-12-05
B X 2020-12-05 2020-12-07
C X 2020-12-07 2020-12-08
C X 2020-12-07 2020-12-09
...
I want to transform it into this data frame:我想把它转换成这个数据框:
name event join_date day_0 day_1 day_2 .... day_n
A X 2020-12-01 2 1 0 0
A Y 2020-12-01 0 1 0 0
B X 2020-12-05 1 0 1 0
C X 2020-12-07 0 1 1 0
...
the first rows mean that user A doing twice Event X on day_0 (first day he joins) and once on the first day and so on until day_n第一行表示用户 A 在 day_0(他加入的第一天)执行两次 Event X,在第一天执行一次,以此类推直到 day_n
For now, the result is like this:目前,结果是这样的:
name event join_date day_0 day_1 day_2 .... day_n
A X 2020-12-01 2 1 0 0
A Y 2020-12-01 0 1 0 0
B X 2020-12-05 1 0 1 0
C X 2020-12-07 1 1 0 0
...
the code set the 2020-12-02 as day_0, not day_1 because there is no 2020-12-01 on A user with Y event代码将 2020-12-02 设置为 day_0,而不是 day_1,因为在具有 Y 事件的用户上没有 2020-12-01
First subtract all values created_at
by first value per groups by GroupBy.transform
.首先通过
GroupBy.transform
减去每个组的第一个值created_at
。
Then use DataFrame.pivot_table
first, add all possible datetimes by DataFrame.reindex
by timedelta_range
and then convert columns names by range
:然后首先使用
DataFrame.pivot_table
,通过DataFrame.reindex
通过timedelta_range
添加所有可能的日期timedelta_range
,然后通过range
转换列名:
df['d'] = df['created_at'].sub(df['join_date'])
print (df)
name event join_date created_at d
0 A X 2020-12-01 2020-12-01 0 days
1 A X 2020-12-01 2020-12-01 0 days
2 A X 2020-12-01 2020-12-02 1 days
3 A Y 2020-12-01 2020-12-02 1 days
4 B X 2020-12-05 2020-12-05 0 days
5 B X 2020-12-05 2020-12-07 2 days
6 C X 2020-12-07 2020-12-08 1 days
7 C X 2020-12-07 2020-12-09 2 days
df1 = (df.pivot_table(index=['name','event','join_date'],
columns='d',
aggfunc='size',
fill_value=0)
.reindex(pd.timedelta_range(df['d'].min(), df['d'].max()),
axis=1,
fill_value=0))
df1.columns = [f'day_{i}' for i in range(len(df1.columns))]
df1 = df1.reset_index()
print (df1)
name event join_date day_0 day_1 day_2
0 A X 2020-12-01 2 1 0
1 A Y 2020-12-01 0 1 0
2 B X 2020-12-05 1 0 1
3 C X 2020-12-07 0 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.