count the number of unique combinations in pandas data frame

Question

I am having trouble (brain block) producing some simple summary statistics for my data.

What I would like to do is to count the number of co-occurring "code" values across all "id"s. The data looks like:

So the output would look like this table. Or perhaps by adding a factorized column on the raw data "combo-id" for each unique combination.

Combo    Count    combo-id
(A)      2        1
(A,B)    2        2
(A,C)    1        3
(A,B,C)  1        4

Here is a similar QA, but looking at unique pairs only

Answer 1

First create tuple s per groups and then get counts by GroupBy.size :

s = df.groupby('id')['code'].apply(tuple).rename('Combo')
#if duplicates don't matter, thank you @cripcate
#s = df.groupby('id')['code'].apply(set).rename('Combo')
df1 = s.groupby(s).size().reset_index(name='Count')
print (df1)
       Combo  Count
0       (A,)      2
1     (A, B)      2
2  (A, B, C)      1
3     (A, C)      1

Answer 2

Try adding .unique()

Series.unique()[source]

 Return unique values of Series object. Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.

More here.

count the number of unique combinations in pandas data frame

Question

2 answers

solution1
2 ACCPTED 2019-04-24 08:48:41

solution2
-2 2019-04-24 08:44:02

count the number of unique combinations in pandas data frame

Question

2 answers

solution1 2 ACCPTED 2019-04-24 08:48:41

solution2 -2 2019-04-24 08:44:02

solution1
2 ACCPTED 2019-04-24 08:48:41

solution2
-2 2019-04-24 08:44:02