Pandas：使用 groupby 和 nunique 考虑时间

Question

I have a dataframe in this form:我有一个 dataframe 这种形式：

A    B    time
1    2    2019-01-03
1    3    2018-04-05
1    4    2020-01-01
1    4    2020-02-02

where A and B contain some integer identifiers.其中 A 和 B 包含一些 integer 标识符。 I want to measure the number of different identifiers each A has interacted with.我想测量每个 A 与之交互的不同标识符的数量。 To do this I usually simply do为此，我通常只是简单地做

df.groupby('A')['B'].nunique()

I now have to do a slightly different thing: each identifier has a date assigned (different for each identifier), that splits its interactions in 2 parts: the ones happening before that date, and the ones happening after that date.我现在必须做一件稍微不同的事情：每个标识符都有一个分配的日期（每个标识符都不同），它将其交互分为两部分：在该日期之前发生的那些，以及在该日期之后发生的那些。 The same operation previously done (counting number of unique B interacted with ) needs to be done for both parts separately.之前完成的相同操作（计算与交互的唯一 B 的数量）需要分别为两个部分完成。

For example, if the date for A=1 was 2018-07-01, the output would be例如，如果 A=1 的日期是 2018-07-01，则 output 将是

A    before    after
1    1         2

In the real data, A contains millions of different identifiers, each with its unique date assigned.在真实数据中，A 包含数百万个不同的标识符，每个标识符都有其唯一的日期。

EDITED To be more clear I added a line to df.编辑为了更清楚，我在 df. I want to count the number of different values of B each A interacts with before and after the date我想计算日期之前和之后每个 A 与之交互的 B 的不同值的数量

Answer 1

I would convert A into dates, compare those with df['time'] and then groupby().value_counts() :我会将A转换为日期，将它们与df['time']进行比较，然后再进行groupby().value_counts() ：

(df['A'].map(date_dict)
    .gt(df['time'])
    .groupby(df['A'])
    .value_counts()
    .unstack()
    .rename({False:'after',True:'before'}, axis=1)
)

Output: Output：

   after  before
A               
1      2       1

Pandas：使用 groupby 和 nunique 考虑时间

问题描述

1 个解决方案

解决方案1
1 2020-05-07 18:06:23

Pandas：使用 groupby 和 nunique 考虑时间

问题描述

1 个解决方案

解决方案1 1 2020-05-07 18:06:23

解决方案1
1 2020-05-07 18:06:23