简体   繁体   中英

Grouping by unique values in python pandas dataframe

I have a datafame that goes like this

                      id    rev     committer_id 
date                
1996-07-03 08:18:15     1   76620   1
1996-07-03 08:18:15     2   76621   2
1996-11-18 20:51:08     3   76987   3
1996-11-21 09:12:53     4   76995   2
1996-11-21 09:16:33     5   76997   2
1996-11-21 09:39:27     6   76999   2
1996-11-21 09:53:37     7   77003   2
1996-11-21 10:11:35     8   77006   2
1996-11-21 10:17:50     9   77008   2
1996-11-21 10:23:58     10  77010   2
1996-11-21 10:32:58     11  77012   2
1996-11-21 10:55:51     12  77014   2

I would like to group by quarterly periods and then show number of unique entries in the committer_id column. Columns id and rev are actually not used for the moment.

I would like to have a result as the following committer_id

date    
1996-09-30  2
1996-12-31  91
1997-03-31  56
1997-06-30  154
1997-09-30  84

The actual results are aggregated by number of entries in each time period and not by unique entries. I am using the following :

df[['committer_id']].groupby(pd.Grouper(freq='Q-DEC')).aggregate(np.size)

Can't figure how to use np.unique.

Any ideas, please.

Best,

--

df[['committer_id']].groupby(pd.Grouper(freq='Q-DEC')).aggregate(pd.Series.nunique)

Should work for you. Or df.groupby(pd.Grouper(freq='Q-DEC'))['committer_id'].nunique()

Your try with np.unique didn't work because that returns an array of unique items. The result for agg must be a scalar. So .aggregate(lambda x: len(np.unique(x)) probably would work too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM