简体   繁体   English

计算熊猫分组对象中的唯一值

[英]Counting unique values in a pandas grouped object

I have a table in pandas/python and I am doing the following: 我在pandas / python中有一张表,并且正在执行以下操作:

grouped_data = df_comments_cols['article_id'].groupby(df_comments_cols['user_id']) grouped_data = df_comments_cols ['article_id']。groupby(df_comments_cols ['user_id'])

Now to count the number of articles per user I do the following: 现在,要计算每个用户的文章数,请执行以下操作:

ct_grouped_data = grouped_data.count() ct_grouped_data = grouped_data.count()

The above counts the number of article IDs per user. 以上计算了每个用户的文章ID数。 However, sometimes there are multiple of the same article IDs per user (in the sense that a user has interacted with that article more than once) and I only wish to count unique article IDs per user - is there a quick way to do this? 但是,有时每个用户有多个相同的商品ID(从某种意义上说,一个用户与该商品进行了多次交互),而我只希望为每个用户计算唯一的商品ID-有一种快速的方法吗?

Thanks in advance. 提前致谢。

I think what you might be looking for is nunique , which you can call on GroupBy objects like so: 我认为您可能正在寻找的是nunique ,您可以像这样调用GroupBy对象:

In [63]: df = DataFrame({'a': randn(1000, 1)})

In [64]: df['user_id'] = randint(100, 1000, size=len(df))

In [65]: df['article_id'] = randint(100, size=len(df))

In [66]: gb = df.article_id.groupby(df.user_id)

In [67]: gb.nunique()
Out[67]:
user_id
100        2
101        1
102        1
104        2
105        1
106        2
107        1
110        1
111        4
112        2
113        1
114        2
115        1
116        1
118        1
...
976        3
980        1
982        1
983        1
986        1
987        1
988        1
989        2
990        1
993        1
994        2
996        1
997        1
998        1
999        1
Length: 617, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM