简体   繁体   English

Pandas:使用 groupby 和 nunique 考虑时间

[英]Pandas: using groupby and nunique taking time into account

I have a dataframe in this form:我有一个 dataframe 这种形式:

A    B    time
1    2    2019-01-03
1    3    2018-04-05
1    4    2020-01-01
1    4    2020-02-02

where A and B contain some integer identifiers.其中 A 和 B 包含一些 integer 标识符。 I want to measure the number of different identifiers each A has interacted with.我想测量每个 A 与之交互的不同标识符的数量。 To do this I usually simply do为此,我通常只是简单地做

df.groupby('A')['B'].nunique()   

I now have to do a slightly different thing: each identifier has a date assigned (different for each identifier), that splits its interactions in 2 parts: the ones happening before that date, and the ones happening after that date.我现在必须做一件稍微不同的事情:每个标识符都有一个分配的日期(每个标识符都不同),它将其交互分为两部分:在该日期之前发生的那些,以及在该日期之后发生的那些。 The same operation previously done (counting number of unique B interacted with ) needs to be done for both parts separately.之前完成的相同操作(计算与 交互的唯一 B 的数量)需要分别为两个部分完成。

For example, if the date for A=1 was 2018-07-01, the output would be例如,如果 A=1 的日期是 2018-07-01,则 output 将是

A    before    after
1    1         2

In the real data, A contains millions of different identifiers, each with its unique date assigned.在真实数据中,A 包含数百万个不同的标识符,每个标识符都有其唯一的日期。

EDITED To be more clear I added a line to df.编辑为了更清楚,我在 df. I want to count the number of different values of B each A interacts with before and after the date我想计算日期之前和之后每个 A 与之交互的 B 的不同值的数量

I would convert A into dates, compare those with df['time'] and then groupby().value_counts() :我会将A转换为日期,将它们与df['time']进行比较,然后再进行groupby().value_counts()

(df['A'].map(date_dict)
    .gt(df['time'])
    .groupby(df['A'])
    .value_counts()
    .unstack()
    .rename({False:'after',True:'before'}, axis=1)
)

Output: Output:

   after  before
A               
1      2       1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM