[英]Counting unique values in a pandas grouped object
I have a table in pandas/python and I am doing the following: 我在pandas / python中有一张表,并且正在执行以下操作:
grouped_data = df_comments_cols['article_id'].groupby(df_comments_cols['user_id'])
grouped_data = df_comments_cols ['article_id']。groupby(df_comments_cols ['user_id'])
Now to count the number of articles per user I do the following: 现在,要计算每个用户的文章数,请执行以下操作:
ct_grouped_data = grouped_data.count()
ct_grouped_data = grouped_data.count()
The above counts the number of article IDs per user. 以上计算了每个用户的文章ID数。 However, sometimes there are multiple of the same article IDs per user (in the sense that a user has interacted with that article more than once) and I only wish to count unique article IDs per user - is there a quick way to do this?
但是,有时每个用户有多个相同的商品ID(从某种意义上说,一个用户与该商品进行了多次交互),而我只希望为每个用户计算唯一的商品ID-有一种快速的方法吗?
Thanks in advance. 提前致谢。
I think what you might be looking for is nunique
, which you can call on GroupBy
objects like so: 我认为您可能正在寻找的是
nunique
,您可以像这样调用GroupBy
对象:
In [63]: df = DataFrame({'a': randn(1000, 1)})
In [64]: df['user_id'] = randint(100, 1000, size=len(df))
In [65]: df['article_id'] = randint(100, size=len(df))
In [66]: gb = df.article_id.groupby(df.user_id)
In [67]: gb.nunique()
Out[67]:
user_id
100 2
101 1
102 1
104 2
105 1
106 2
107 1
110 1
111 4
112 2
113 1
114 2
115 1
116 1
118 1
...
976 3
980 1
982 1
983 1
986 1
987 1
988 1
989 2
990 1
993 1
994 2
996 1
997 1
998 1
999 1
Length: 617, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.