在兩列中按分組依據計數唯一值

Question

我想根據pandas df兩columns來確定unique值的count 。

下面是一個示例：

import pandas as pd

d = ({
    'B' : ['08:00:00','John','08:10:00','Gary','08:41:42','John','08:50:00','John', '09:00:00', 'Gary','09:15:00','John','09:21:00','Gary','09:30:00','Gary','09:40:00','Gary'],
    'C' : ['1','1','1','1','1','1','2','2','2', '2','2','2','3','3','3', '3','3','3'],           
    'A' : ['Stop','','Res','','Start','','Stop','','Res','','Start','','Stop','','Res','','Start','']
    })

df = pd.DataFrame(data=d)

輸出：

        A         B  C
0    Stop  08:00:00  1
1              John  1
2     Res  08:10:00  1
3              Gary  1
4   Start  08:41:42  1
5              John  1
6    Stop  08:50:00  2
7              John  2
8     Res  09:00:00  2
9              Gary  2
10  Start  09:15:00  2
11             John  2
12   Stop  09:21:00  3
13             Gary  3
14    Res  09:30:00  3
15             Gary  3
16  Start  09:40:00  3
17             Gary  3

如果我基於Column A和C進行計數，則返回以下內容：

k = df.groupby('A').C.nunique()

Res      3
Start    3
Stop     3

我希望根據Column B的人員將其拆分。 因此，預期的輸出為：

John Stop  2
     Res   0 #Nan
     Start 2

Gary Stop  1
     Res   3 
     Start 1

我已經嘗試過k = df.groupby('A').BCnunique()

Answer 1

我們可以創建一個扁平化的DF：

In [34]: d = pd.DataFrame(np.column_stack((df.iloc[::2], df.iloc[1::2, [0]])), columns=['time','id','op','name'])

In [35]: d
Out[35]:
       time id     op  name
0  08:00:00  1   Stop  John
1  08:10:00  1    Res  Gary
2  08:41:42  1  Start  John
3  08:50:00  2   Stop  John
4  09:00:00  2    Res  Gary
5  09:15:00  2  Start  John
6  09:21:00  3   Stop  Gary
7  09:30:00  3    Res  Gary
8  09:40:00  3  Start  Gary

准備一個多索引，其中將包括所有組合：

In [36]: idx = pd.MultiIndex.from_product((d.name.unique(), d.op.unique()))

並按兩列分組：

In [39]: res = d.groupby(['name','op'])['id'].count().reindex(idx, fill_value=0)

In [40]: res
Out[40]:
John  Stop     2
      Res      0
      Start    2
Gary  Stop     1
      Res      3
      Start    1
Name: id, dtype: int64

Answer 2

它是一個奇怪的數據框，強烈建議不要在同一列中包含時間和名稱。 只需添加另一列！ 這將使事情變得容易。

根據您的數據，如果您不介意約翰缺少RES ：

df[df==''] = None
df = df.fillna(method='ffill')
df[df['B'].isin(['Gary', 'John'])].groupby(['B', 'A']).C.nunique()

在兩列中按分組依據計數唯一值

問題描述

2 個解決方案

解決方案1
6 已采納 2018-06-20 08:01:32

解決方案2
0 2018-06-20 08:05:12

在兩列中按分組依據計數唯一值

問題描述

2 個解決方案

解決方案1 6 已采納 2018-06-20 08:01:32

解決方案2 0 2018-06-20 08:05:12

解決方案1
6 已采納 2018-06-20 08:01:32

解決方案2
0 2018-06-20 08:05:12