Pandas Aggregate groupby

Question

我有一個概念上看起來如下的數據框：

df = pd.DataFrame({
    "a": [1, 1, 1, 2, 2,3],
    "b": ["a", "a", "c", "a", "d","a"],
    "c": ["2", "3", "4", "2", "3","2"]
})

      a    b    c
  0   1   'a'  '2' 
  1   1   'a'  '3'
  2   1   'c'  '4'
  3   2   'a'  '2'
  4   2   'd'  '3'
  5   3   'a'  '2'

對於每個組中a我需要統計獨特的(b,c)值高達這里。

所以在這個例子中，ouptut應該是[3,4,4] 。

（因為在組1中有3個唯一的(b,c)對，並且在組1和組2中共有4個唯一的(b,c)值，並且在組1和2和3中一起也只有4個唯一(b,c)值。

我嘗試使用expanding與groupby和nunique但我無法弄清楚語法。

任何幫助將不勝感激！

Answer 1

首先找到唯一行的索引：

idx = df[['b','c']].drop_duplicates().index

然后找到每組中剩余行數的累積總和：

np.cumsum(df.iloc[idx,:].groupby('a').count()['b'])

回國

a
1    3
2    4

Answer 2

我改進了Dan的答案。

df['t'] = np.cumsum(~df[['b','c']].duplicated())
df.groupby('a')['t'].last()
Out[44]: 
a
1    3
2    4
3    4
Name: t, dtype: int64

Answer 3

這是一個棘手的問題。 這就是你追求的嗎？

result = (
    df.a.drop_duplicates(keep='last')
    .reset_index()['index']
    .apply(lambda x: df.loc[df.index<=x].pipe(lambda x: (x.b+x.c).nunique()))
     )


result
Out[27]: 
0    3
1    4
Name: index, dtype: int64

Answer 4

您可以在groupby之后使用drop_duplicates並獲取對象的shape ：

df = pd.DataFrame({
    "a": [1, 1, 1, 2, 2],
    "b": ["a", "a", "c", "a", "d"],
    "c": ["2", "3", "4", "2", "3"]
})
result = df.groupby("a").apply(lambda x: x.drop_duplicates().shape[0])

如果要在以下列表中轉換結果：

result.tolist()

結果將是[3,2]與你的例子，因為你有3個獨特的情侶，對於組a=1和2個獨特的情侶，對於組a=2 。

如果你想要colums'b'和'c'的獨特情侶數：

df[["b", "c"]].drop_duplicates().shape[0]

Pandas Aggregate groupby

問題描述

4 個解決方案

解決方案1
2 2018-01-30 11:05:29

解決方案2
2 2018-01-30 11:48:12

解決方案3
1 已采納 2018-01-30 11:04:19

解決方案4
0 2018-01-30 10:56:58

Pandas Aggregate groupby

問題描述

4 個解決方案

解決方案1 2 2018-01-30 11:05:29

解決方案2 2 2018-01-30 11:48:12

解決方案3 1 已采納 2018-01-30 11:04:19

解決方案4 0 2018-01-30 10:56:58

解決方案1
2 2018-01-30 11:05:29

解決方案2
2 2018-01-30 11:48:12

解決方案3
1 已采納 2018-01-30 11:04:19

解決方案4
0 2018-01-30 10:56:58