[英]Count the number of unique values per group
I have 2 columns - _a, _b. 我有2列-_a,_b。
import numpy as np
import pandas as pd
df = pd.DataFrame({'_a':[1,1,1,2,2,3,3],'_b':[3,4,5,3,3,3,9], 'a_b_3':[3,3,3,1,1,2,2]})
df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2
I need create column a_b_3 (unique count from column '_b') use groupby from pandas. 我需要使用来自熊猫的groupby创建a_b_3列(来自'_b'列的唯一计数)。 Thank you in advance. 先感谢您。
Looks like you want transform
+ nunique
; 看起来像您要transform
+ nunique
;
df['a_b_3'] = df.groupby('_a')['_b'].transform('nunique')
df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2
This is effectively groupby
+ nunique
+ map
: 这实际上是groupby
+ nunique
+ map
:
v = df.groupby('_a')['_b'].nunique()
df['a_b_3'] = df['_a'].map(v)
df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2
Use - 采用 -
df2=df.groupby(['_a'])['_b'].nunique().reset_index()
df['a_b_3'] = df.merge(df2, how='left', on='_a')[['_b_y']]
Output 输出量
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2
If I understand you correctly what you want is to group by column _a, count the number of unique values in column _b within each group and then append this count to the original dataframe using _a as the key. 如果我正确理解了您想要按_a列分组的内容,请计算每个组中_b列中唯一值的数量,然后使用_a作为键将此计数附加到原始数据帧中。 The following code should achieve that. 下面的代码应该可以实现这一点。
df.merge(pd.DataFrame(df.groupby('_a')._b.nunique()), left_on='_a', right_index=True)
Breaking it down, the first thing is to group by _a and then count the uniques in column _b. 分解起来,第一件事是对_a进行分组,然后计算_b列中的唯一性。 That's what df.groupby('_a')._b.nunique()
does. df.groupby('_a')._b.nunique()
这样做的。 Then it's merged with the original dataframe using _a as the key. 然后使用_a作为键将其与原始数据帧合并。 The groupby returns a series so we need to convert it to a dataframe before merging, hence the pd.DataFrame
. groupby返回一个序列,因此我们需要在合并之前将其转换为数据pd.DataFrame
,因此将pd.DataFrame
转换pd.DataFrame
数据pd.DataFrame
。
EDIT 编辑
@COLDSPEED's answer above is much more efficient than this one. 上面@COLDSPEED的答案比这个答案有效得多。 To give an idea of the speed difference I ran a timeit which shows a speed up of 2x on this small dataframe, on larger dataframes it would probably be even more. 为了给出速度差的概念,我运行了一个timeit,它显示了在这个小数据帧上的速度提高了2倍,在大数据帧上的速度可能会更高。
Using merge: 使用合并:
%timeit df.merge(pd.DataFrame(df.groupby('_a')._b.nunique()), left_on='_a', right_index=True)
1.43 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Using transform: 使用转换:
%timeit df.groupby('_a')['_b'].transform('nunique')
750 µs ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.