Count of unique values by groupby in two columns

Question

I want to determine the count of unique values based off two columns in a pandas df .

Below is an example:

import pandas as pd

d = ({
    'B' : ['08:00:00','John','08:10:00','Gary','08:41:42','John','08:50:00','John', '09:00:00', 'Gary','09:15:00','John','09:21:00','Gary','09:30:00','Gary','09:40:00','Gary'],
    'C' : ['1','1','1','1','1','1','2','2','2', '2','2','2','3','3','3', '3','3','3'],           
    'A' : ['Stop','','Res','','Start','','Stop','','Res','','Start','','Stop','','Res','','Start','']
    })

df = pd.DataFrame(data=d)

Output:

        A         B  C
0    Stop  08:00:00  1
1              John  1
2     Res  08:10:00  1
3              Gary  1
4   Start  08:41:42  1
5              John  1
6    Stop  08:50:00  2
7              John  2
8     Res  09:00:00  2
9              Gary  2
10  Start  09:15:00  2
11             John  2
12   Stop  09:21:00  3
13             Gary  3
14    Res  09:30:00  3
15             Gary  3
16  Start  09:40:00  3
17             Gary  3

If I perform the count based of Column A and C I return the following:

k = df.groupby('A').C.nunique()

Res      3
Start    3
Stop     3

I'm hoping to split those up based on the people in Column B . So the intended output would be:

John Stop  2
     Res   0 #Nan
     Start 2

Gary Stop  1
     Res   3 
     Start 1

I have tried k = df.groupby('A').BCnunique()

Answer 1

we can create a flattened DF:

In [34]: d = pd.DataFrame(np.column_stack((df.iloc[::2], df.iloc[1::2, [0]])), columns=['time','id','op','name'])

In [35]: d
Out[35]:
       time id     op  name
0  08:00:00  1   Stop  John
1  08:10:00  1    Res  Gary
2  08:41:42  1  Start  John
3  08:50:00  2   Stop  John
4  09:00:00  2    Res  Gary
5  09:15:00  2  Start  John
6  09:21:00  3   Stop  Gary
7  09:30:00  3    Res  Gary
8  09:40:00  3  Start  Gary

prepare a multi-index, which will include all combinations:

In [36]: idx = pd.MultiIndex.from_product((d.name.unique(), d.op.unique()))

and group by two columns:

In [39]: res = d.groupby(['name','op'])['id'].count().reindex(idx, fill_value=0)

In [40]: res
Out[40]:
John  Stop     2
      Res      0
      Start    2
Gary  Stop     1
      Res      3
      Start    1
Name: id, dtype: int64

Answer 2

Its a strange dataframe, would strongly advice to not have times and names in the same column. Just add another column! This will make things easier.

Given your data, if you don't mind RES missing from John:

df[df==''] = None
df = df.fillna(method='ffill')
df[df['B'].isin(['Gary', 'John'])].groupby(['B', 'A']).C.nunique()

Count of unique values by groupby in two columns

Question

2 answers

solution1
6 ACCPTED 2018-06-20 08:01:32

solution2
0 2018-06-20 08:05:12

Count of unique values by groupby in two columns

Question

2 answers

solution1 6 ACCPTED 2018-06-20 08:01:32

solution2 0 2018-06-20 08:05:12

solution1
6 ACCPTED 2018-06-20 08:01:32

solution2
0 2018-06-20 08:05:12