Pandas Aggregate groupby

Question

I have a dataframe that looks conceptually like the following: 我有一个概念上看起来如下的数据框：

df = pd.DataFrame({
    "a": [1, 1, 1, 2, 2,3],
    "b": ["a", "a", "c", "a", "d","a"],
    "c": ["2", "3", "4", "2", "3","2"]
})

      a    b    c
  0   1   'a'  '2' 
  1   1   'a'  '3'
  2   1   'c'  '4'
  3   2   'a'  '2'
  4   2   'd'  '3'
  5   3   'a'  '2'

For each group in a I need to count the unique (b,c) values up to here. 对于每个组中a我需要统计独特的(b,c)值高达这里。

So in this example the ouptut should be [3,4,4] . 所以在这个例子中，ouptut应该是[3,4,4] 。

(Because in group 1 there are 3 unique (b,c) pairs, and in group 1 and 2 together there are 4 unique (b,c) values, and in group 1 and 2 and 3 together there are also only 4 unique (b,c) values. （因为在组1中有3个唯一的(b,c)对，并且在组1和组2中共有4个唯一的(b,c)值，并且在组1和2和3中一起也只有4个唯一(b,c)值。

I tried using expanding with groupby and nunique but I couldn't figure out the syntax. 我尝试使用expanding与groupby和nunique但我无法弄清楚语法。

Any help will be appreciated! 任何帮助将不胜感激！

Answer 1

First find the indices of the unique rows: 首先找到唯一行的索引：

idx = df[['b','c']].drop_duplicates().index

Then find the cumulative sum of the number of rows left in each group: 然后找到每组中剩余行数的累积总和：

np.cumsum(df.iloc[idx,:].groupby('a').count()['b'])

returning 回国

a
1    3
2    4

Answer 2

I improved Dan's answer. 我改进了Dan的答案。

df['t'] = np.cumsum(~df[['b','c']].duplicated())
df.groupby('a')['t'].last()
Out[44]: 
a
1    3
2    4
3    4
Name: t, dtype: int64

Answer 3

This is a tricky question. 这是一个棘手的问题。 Is this what you are after? 这就是你追求的吗？

result = (
    df.a.drop_duplicates(keep='last')
    .reset_index()['index']
    .apply(lambda x: df.loc[df.index<=x].pipe(lambda x: (x.b+x.c).nunique()))
     )


result
Out[27]: 
0    3
1    4
Name: index, dtype: int64

Answer 4

You can use the drop_duplicates after your groupby and get the shape of the object : 您可以在groupby之后使用drop_duplicates并获取对象的shape ：

df = pd.DataFrame({
    "a": [1, 1, 1, 2, 2],
    "b": ["a", "a", "c", "a", "d"],
    "c": ["2", "3", "4", "2", "3"]
})
result = df.groupby("a").apply(lambda x: x.drop_duplicates().shape[0])

If you want to convert the result in list after : 如果要在以下列表中转换结果：

result.tolist()

The result will be [3,2] with your example because you have 3 unique couples for group a=1 and 2 unique couples for group a=2 . 结果将是[3,2]与你的例子，因为你有3个独特的情侣，对于组a=1和2个独特的情侣，对于组a=2 。

If you want the number of unique couple for colums 'b' and 'c' : 如果你想要colums'b'和'c'的独特情侣数：

df[["b", "c"]].drop_duplicates().shape[0]

Pandas Aggregate groupby

问题描述

4 个解决方案

解决方案1
2 2018-01-30 11:05:29

解决方案2
2 2018-01-30 11:48:12

解决方案3
1 已采纳 2018-01-30 11:04:19

解决方案4
0 2018-01-30 10:56:58

Pandas Aggregate groupby

问题描述

4 个解决方案

解决方案1 2 2018-01-30 11:05:29

解决方案2 2 2018-01-30 11:48:12

解决方案3 1 已采纳 2018-01-30 11:04:19

解决方案4 0 2018-01-30 10:56:58

解决方案1
2 2018-01-30 11:05:29

解决方案2
2 2018-01-30 11:48:12

解决方案3
1 已采纳 2018-01-30 11:04:19

解决方案4
0 2018-01-30 10:56:58