[英]Pandas: using groupby and nunique taking time into account
I have a dataframe in this form:我有一个 dataframe 这种形式:
A B time
1 2 2019-01-03
1 3 2018-04-05
1 4 2020-01-01
1 4 2020-02-02
where A and B contain some integer identifiers.其中 A 和 B 包含一些 integer 标识符。 I want to measure the number of different identifiers each A has interacted with.
我想测量每个 A 与之交互的不同标识符的数量。 To do this I usually simply do
为此,我通常只是简单地做
df.groupby('A')['B'].nunique()
I now have to do a slightly different thing: each identifier has a date assigned (different for each identifier), that splits its interactions in 2 parts: the ones happening before that date, and the ones happening after that date.我现在必须做一件稍微不同的事情:每个标识符都有一个分配的日期(每个标识符都不同),它将其交互分为两部分:在该日期之前发生的那些,以及在该日期之后发生的那些。 The same operation previously done (counting number of unique B interacted with ) needs to be done for both parts separately.
之前完成的相同操作(计算与 交互的唯一 B 的数量)需要分别为两个部分完成。
For example, if the date for A=1 was 2018-07-01, the output would be例如,如果 A=1 的日期是 2018-07-01,则 output 将是
A before after
1 1 2
In the real data, A contains millions of different identifiers, each with its unique date assigned.在真实数据中,A 包含数百万个不同的标识符,每个标识符都有其唯一的日期。
EDITED To be more clear I added a line to df.编辑为了更清楚,我在 df. I want to count the number of different values of B each A interacts with before and after the date
我想计算日期之前和之后每个 A 与之交互的 B 的不同值的数量
I would convert A
into dates, compare those with df['time']
and then groupby().value_counts()
:我会将
A
转换为日期,将它们与df['time']
进行比较,然后再进行groupby().value_counts()
:
(df['A'].map(date_dict)
.gt(df['time'])
.groupby(df['A'])
.value_counts()
.unstack()
.rename({False:'after',True:'before'}, axis=1)
)
Output: Output:
after before
A
1 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.