简体   繁体   English

pandas从groupby和size获得标准化值?

[英]pandas get normalize values from groupby and size?

I know that we can get normalized values from value_counts() of a pandas series but when we do a group by on a dataframe, the only way to get counts is through size() . 我知道我们可以从pandas系列的value_counts()中获取规范化值,但是当我们在数据帧上进行分组时,获取计数的唯一方法是通过size() Is there any way to get normalized values with size()? 有没有办法用size()获得规范化的值?

Example: 例:

df = pd.DataFrame({'subset_product':['A','A','A','B','B','C','C'],
                   'subset_close':[1,1,0,1,1,1,0]})
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')

df.subset_product.value_counts()
A    3
B    2
C    2

df2 DF2

在此输入图像描述

Looking to get: 希望得到:

subset_product subset_close prod_count norm
A              0            1          1/3
A              1            2          2/3
B              1            2          2/2
C              1            1          1/2
C              0            1          1/2

subset_product Besides manually calculating the normalized values as prod_count/total, is there any way to get normalized values? subset_product除了手动计算标准化值为prod_count / total之外,有没有办法获得标准化值?

I think it is not possible only one groupby + size because groupby by 2 columns subset_product and subset_close and need size by subset_product only for normalize. 我认为不可能只有一个groupby + size因为groupby by 2 columns subset_productsubset_close并且需要subset_product size仅用于normalize。

Possible solutions are map or transform for Series with same size as df2 with div : 可能的解决方案是maptransformSeries具有相同尺寸的df2div

df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
s = df.subset_product.value_counts()
df2['prod_count'] = df2['prod_count'].div(df2['subset_product'].map(s))

Or: 要么:

df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
a = df2.groupby('subset_product')['prod_count'].transform('sum')
df2['prod_count'] = df2['prod_count'].div(a)

print (df2)
  subset_product  subset_close  prod_count
0              A             0    0.333333
1              A             1    0.666667
2              B             1    1.000000
3              C             0    0.500000
4              C             1    0.500000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM