I am having trouble (brain block) producing some simple summary statistics for my data.
What I would like to do is to count the number of co-occurring "code" values across all "id"s. The data looks like:
id code
1 A
2 A
2 B
3 A
3 B
4 A
5 A
5 C
6 A
6 B
6 C
So the output would look like this table. Or perhaps by adding a factorized column on the raw data "combo-id" for each unique combination.
Combo Count combo-id
(A) 2 1
(A,B) 2 2
(A,C) 1 3
(A,B,C) 1 4
First create tuple
s per groups and then get counts by GroupBy.size
:
s = df.groupby('id')['code'].apply(tuple).rename('Combo')
#if duplicates don't matter, thank you @cripcate
#s = df.groupby('id')['code'].apply(set).rename('Combo')
df1 = s.groupby(s).size().reset_index(name='Count')
print (df1)
Combo Count
0 (A,) 2
1 (A, B) 2
2 (A, B, C) 1
3 (A, C) 1
Try adding .unique()
Series.unique()[source]
Return unique values of Series object. Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.