I have a table in pandas/python and I am doing the following:
grouped_data = df_comments_cols['article_id'].groupby(df_comments_cols['user_id'])
Now to count the number of articles per user I do the following:
ct_grouped_data = grouped_data.count()
The above counts the number of article IDs per user. However, sometimes there are multiple of the same article IDs per user (in the sense that a user has interacted with that article more than once) and I only wish to count unique article IDs per user - is there a quick way to do this?
Thanks in advance.
I think what you might be looking for is nunique
, which you can call on GroupBy
objects like so:
In [63]: df = DataFrame({'a': randn(1000, 1)})
In [64]: df['user_id'] = randint(100, 1000, size=len(df))
In [65]: df['article_id'] = randint(100, size=len(df))
In [66]: gb = df.article_id.groupby(df.user_id)
In [67]: gb.nunique()
Out[67]:
user_id
100 2
101 1
102 1
104 2
105 1
106 2
107 1
110 1
111 4
112 2
113 1
114 2
115 1
116 1
118 1
...
976 3
980 1
982 1
983 1
986 1
987 1
988 1
989 2
990 1
993 1
994 2
996 1
997 1
998 1
999 1
Length: 617, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.