I want to determine the count
of unique
values based off two columns
in a pandas
df
.
Below is an example:
import pandas as pd
d = ({
'B' : ['08:00:00','John','08:10:00','Gary','08:41:42','John','08:50:00','John', '09:00:00', 'Gary','09:15:00','John','09:21:00','Gary','09:30:00','Gary','09:40:00','Gary'],
'C' : ['1','1','1','1','1','1','2','2','2', '2','2','2','3','3','3', '3','3','3'],
'A' : ['Stop','','Res','','Start','','Stop','','Res','','Start','','Stop','','Res','','Start','']
})
df = pd.DataFrame(data=d)
Output:
A B C
0 Stop 08:00:00 1
1 John 1
2 Res 08:10:00 1
3 Gary 1
4 Start 08:41:42 1
5 John 1
6 Stop 08:50:00 2
7 John 2
8 Res 09:00:00 2
9 Gary 2
10 Start 09:15:00 2
11 John 2
12 Stop 09:21:00 3
13 Gary 3
14 Res 09:30:00 3
15 Gary 3
16 Start 09:40:00 3
17 Gary 3
If I perform the count based of Column A
and C
I return the following:
k = df.groupby('A').C.nunique()
Res 3
Start 3
Stop 3
I'm hoping to split those up based on the people in Column B
. So the intended output would be:
John Stop 2
Res 0 #Nan
Start 2
Gary Stop 1
Res 3
Start 1
I have tried k = df.groupby('A').BCnunique()
we can create a flattened DF:
In [34]: d = pd.DataFrame(np.column_stack((df.iloc[::2], df.iloc[1::2, [0]])), columns=['time','id','op','name'])
In [35]: d
Out[35]:
time id op name
0 08:00:00 1 Stop John
1 08:10:00 1 Res Gary
2 08:41:42 1 Start John
3 08:50:00 2 Stop John
4 09:00:00 2 Res Gary
5 09:15:00 2 Start John
6 09:21:00 3 Stop Gary
7 09:30:00 3 Res Gary
8 09:40:00 3 Start Gary
prepare a multi-index, which will include all combinations:
In [36]: idx = pd.MultiIndex.from_product((d.name.unique(), d.op.unique()))
and group by two columns:
In [39]: res = d.groupby(['name','op'])['id'].count().reindex(idx, fill_value=0)
In [40]: res
Out[40]:
John Stop 2
Res 0
Start 2
Gary Stop 1
Res 3
Start 1
Name: id, dtype: int64
Its a strange dataframe, would strongly advice to not have times and names in the same column. Just add another column! This will make things easier.
Given your data, if you don't mind RES
missing from John:
df[df==''] = None
df = df.fillna(method='ffill')
df[df['B'].isin(['Gary', 'John'])].groupby(['B', 'A']).C.nunique()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.