计算每组唯一值的数量

Question

I have 2 columns - _a, _b. 我有2列-_a，_b。

import numpy as np 
import pandas as pd
df = pd.DataFrame({'_a':[1,1,1,2,2,3,3],'_b':[3,4,5,3,3,3,9], 'a_b_3':[3,3,3,1,1,2,2]})
df

    _a  _b  a_b_3   
0   1   3   3
1   1   4   3
2   1   5   3
3   2   3   1
4   2   3   1
5   3   3   2
6   3   9   2

I need create column a_b_3 (unique count from column '_b') use groupby from pandas. 我需要使用来自熊猫的groupby创建a_b_3列（来自'_b'列的唯一计数）。 Thank you in advance. 先感谢您。

Answer 1

Looks like you want transform + nunique ; 看起来像您要transform + nunique ;

df['a_b_3'] = df.groupby('_a')['_b'].transform('nunique')        
df
   _a  _b  a_b_3
0   1   3      3
1   1   4      3
2   1   5      3
3   2   3      1
4   2   3      1
5   3   3      2
6   3   9      2

This is effectively groupby + nunique + map : 这实际上是groupby + nunique + map ：

v = df.groupby('_a')['_b'].nunique()
df['a_b_3'] = df['_a'].map(v)

df
   _a  _b  a_b_3
0   1   3      3
1   1   4      3
2   1   5      3
3   2   3      1
4   2   3      1
5   3   3      2
6   3   9      2

Answer 2

Use - 采用 -

df2=df.groupby(['_a'])['_b'].nunique().reset_index()
df['a_b_3'] = df.merge(df2, how='left', on='_a')[['_b_y']]

Output 输出量

   _a  _b  a_b_3
0   1   3      3
1   1   4      3
2   1   5      3
3   2   3      1
4   2   3      1
5   3   3      2
6   3   9      2

Answer 3

If I understand you correctly what you want is to group by column _a, count the number of unique values in column _b within each group and then append this count to the original dataframe using _a as the key. 如果我正确理解了您想要按_a列分组的内容，请计算每个组中_b列中唯一值的数量，然后使用_a作为键将此计数附加到原始数据帧中。 The following code should achieve that. 下面的代码应该可以实现这一点。

df.merge(pd.DataFrame(df.groupby('_a')._b.nunique()), left_on='_a', right_index=True)

Breaking it down, the first thing is to group by _a and then count the uniques in column _b. 分解起来，第一件事是对_a进行分组，然后计算_b列中的唯一性。 That's what df.groupby('_a')._b.nunique() does. df.groupby('_a')._b.nunique()这样做的。 Then it's merged with the original dataframe using _a as the key. 然后使用_a作为键将其与原始数据帧合并。 The groupby returns a series so we need to convert it to a dataframe before merging, hence the pd.DataFrame . groupby返回一个序列，因此我们需要在合并之前将其转换为数据pd.DataFrame ，因此将pd.DataFrame转换pd.DataFrame数据pd.DataFrame 。

EDIT 编辑

@COLDSPEED's answer above is much more efficient than this one. 上面@COLDSPEED的答案比这个答案有效得多。 To give an idea of the speed difference I ran a timeit which shows a speed up of 2x on this small dataframe, on larger dataframes it would probably be even more. 为了给出速度差的概念，我运行了一个timeit，它显示了在这个小数据帧上的速度提高了2倍，在大数据帧上的速度可能会更高。

Using merge: 使用合并：

%timeit df.merge(pd.DataFrame(df.groupby('_a')._b.nunique()), left_on='_a', right_index=True)
1.43 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using transform: 使用转换：

%timeit df.groupby('_a')['_b'].transform('nunique')
750 µs ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

计算每组唯一值的数量

问题描述

3 个解决方案

解决方案1
4 已采纳 2018-05-10 08:44:58

解决方案2
3 2018-05-10 08:44:35

解决方案3
1 2018-05-10 08:49:42

计算每组唯一值的数量

问题描述

3 个解决方案

解决方案1 4 已采纳 2018-05-10 08:44:58

解决方案2 3 2018-05-10 08:44:35

解决方案3 1 2018-05-10 08:49:42

解决方案1
4 已采纳 2018-05-10 08:44:58

解决方案2
3 2018-05-10 08:44:35

解决方案3
1 2018-05-10 08:49:42