计算熊猫分组对象中的唯一值

Question

I have a table in pandas/python and I am doing the following: 我在pandas / python中有一张表，并且正在执行以下操作：

grouped_data = df_comments_cols['article_id'].groupby(df_comments_cols['user_id']) grouped_data = df_comments_cols ['article_id']。groupby（df_comments_cols ['user_id']）

Now to count the number of articles per user I do the following: 现在，要计算每个用户的文章数，请执行以下操作：

ct_grouped_data = grouped_data.count() ct_grouped_data = grouped_data.count（）

The above counts the number of article IDs per user. 以上计算了每个用户的文章ID数。 However, sometimes there are multiple of the same article IDs per user (in the sense that a user has interacted with that article more than once) and I only wish to count unique article IDs per user - is there a quick way to do this? 但是，有时每个用户有多个相同的商品ID（从某种意义上说，一个用户与该商品进行了多次交互），而我只希望为每个用户计算唯一的商品ID-有一种快速的方法吗？

Thanks in advance. 提前致谢。

Answer 1

I think what you might be looking for is nunique , which you can call on GroupBy objects like so: 我认为您可能正在寻找的是nunique ，您可以像这样调用GroupBy对象：

In [63]: df = DataFrame({'a': randn(1000, 1)})

In [64]: df['user_id'] = randint(100, 1000, size=len(df))

In [65]: df['article_id'] = randint(100, size=len(df))

In [66]: gb = df.article_id.groupby(df.user_id)

In [67]: gb.nunique()
Out[67]:
user_id
100        2
101        1
102        1
104        2
105        1
106        2
107        1
110        1
111        4
112        2
113        1
114        2
115        1
116        1
118        1
...
976        3
980        1
982        1
983        1
986        1
987        1
988        1
989        2
990        1
993        1
994        2
996        1
997        1
998        1
999        1
Length: 617, dtype: int64

计算熊猫分组对象中的唯一值

问题描述

1 个解决方案

解决方案1
6 已采纳 2013-08-07 14:01:22

计算熊猫分组对象中的唯一值

问题描述

1 个解决方案

解决方案1 6 已采纳 2013-08-07 14:01:22

解决方案1
6 已采纳 2013-08-07 14:01:22