[英]Grouping by unique values in python pandas dataframe
I have a datafame that goes like this 我有这样的数据声望
id rev committer_id
date
1996-07-03 08:18:15 1 76620 1
1996-07-03 08:18:15 2 76621 2
1996-11-18 20:51:08 3 76987 3
1996-11-21 09:12:53 4 76995 2
1996-11-21 09:16:33 5 76997 2
1996-11-21 09:39:27 6 76999 2
1996-11-21 09:53:37 7 77003 2
1996-11-21 10:11:35 8 77006 2
1996-11-21 10:17:50 9 77008 2
1996-11-21 10:23:58 10 77010 2
1996-11-21 10:32:58 11 77012 2
1996-11-21 10:55:51 12 77014 2
I would like to group by quarterly periods and then show number of unique entries in the committer_id column. 我想按季度周期分组,然后在committer_id列中显示唯一条目的数量。 Columns id and rev are actually not used for the moment. 目前暂时不使用id和rev列。
I would like to have a result as the following committer_id 我想要一个如下committer_id的结果
date
1996-09-30 2
1996-12-31 91
1997-03-31 56
1997-06-30 154
1997-09-30 84
The actual results are aggregated by number of entries in each time period and not by unique entries. 实际结果是按每个时间段中的条目数而不是唯一条目汇总的。 I am using the following : 我正在使用以下内容:
df[['committer_id']].groupby(pd.Grouper(freq='Q-DEC')).aggregate(np.size)
Can't figure how to use np.unique. 无法弄清楚如何使用np.unique。
Any ideas, please. 任何想法,请。
Best, 最好,
-- -
df[['committer_id']].groupby(pd.Grouper(freq='Q-DEC')).aggregate(pd.Series.nunique)
Should work for you. 应该为您工作。 Or df.groupby(pd.Grouper(freq='Q-DEC'))['committer_id'].nunique()
或df.groupby(pd.Grouper(freq='Q-DEC'))['committer_id'].nunique()
Your try with np.unique
didn't work because that returns an array of unique items. 您尝试使用np.unique
无效,因为它返回了一组唯一的项目。 The result for agg
must be a scalar. agg
的结果必须是标量。 So .aggregate(lambda x: len(np.unique(x))
probably would work too. 因此.aggregate(lambda x: len(np.unique(x))
可能也会起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.