如何将value_counts应用于分组对象

Question

I have table that looks like this: 我有一个看起来像这样的表：

userid purchase_date
     1 2016-08-01
     1 2016-08-02
     2 2016-08-01
     2 2016-08-01
     3 2016-08-01
     3 2016-08-02
     3 2016-08-03

I am keeping track of each user's purchase history (a user can purchase multiple times a day). 我正在跟踪每个用户的购买历史记录（用户每天可以多次购买）。 Now， I want to find the earliest date that the user made a purchase, so I did this: 现在，我想找到用户购买的最早日期，所以我这样做了：

df.groupby(userid).purchase_date.transform(min)

Now, I have the earliest purchase date for each. 现在，我有最早的购买日期。 The next thing that I want to do is to apply value_count on it. 我要做的下一件事就是对它应用value_count 。 so I expect to see this: 所以我希望看到这个：

userid earliest_purchase_date
     1 2016-08-01
     2 2016-08-01
     3 2016-08-01

Apply value_counts on earliest_purcahse_date to get: 在earliest_purcahse_date上应用value_counts以获得：

2016-08-01 3

How can I do that? 我怎样才能做到这一点？ I don't know what to do after the transformation. 我不知道转型后该怎么做。

PS I tried df.groupby(userid).purchase_date.transform(min).value_counts() this operation is performed on the entire df , not on each group. PS我尝试了df.groupby(userid).purchase_date.transform(min).value_counts()这个操作是在整个df上执行的，而不是在每个组上执行的。

Answer 1

I think you need groupby with idxmin for get indices of minimal values per group with selecting by loc : 我认为你需要groupby和idxmin来获取idxmin的indices ，并选择loc ：

print (df.groupby('userid')['purchase_date'].idxmin())
userid
1    0
2    2
3    4
Name: purchase_date, dtype: int64

df = df.loc[df.groupby('userid')['purchase_date'].idxmin()]
print (df)
   userid purchase_date
0       1    2016-08-01
2       2    2016-08-01
4       3    2016-08-01

And last value_counts : 最后的value_counts ：

print (df.purchase_date.value_counts())
2016-08-01    3
Name: purchase_date, dtype: int64

If need count how many minimal values is per group: 如果需要计算每组的最小值：

df = df.groupby('userid')['purchase_date']
       .apply(lambda x: pd.Series([len(x[x == x.min()]),x.min()],index=['count','min date']))
       .unstack()
print (df)

       count             min date
userid                           
1          1  2016-08-01 00:00:00
2          2  2016-08-01 00:00:00
3          1  2016-08-01 00:00:00

如何将value_counts应用于分组对象

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-12-14 13:24:07

如何将value_counts应用于分组对象

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-12-14 13:24:07

解决方案1
0 已采纳 2016-12-14 13:24:07