[英]pandas get normalize values from groupby and size?
I know that we can get normalized values from value_counts()
of a pandas series but when we do a group by on a dataframe, the only way to get counts is through size()
. 我知道我们可以从pandas系列的
value_counts()
中获取规范化值,但是当我们在数据帧上进行分组时,获取计数的唯一方法是通过size()
。 Is there any way to get normalized values with size()? 有没有办法用size()获得规范化的值?
Example: 例:
df = pd.DataFrame({'subset_product':['A','A','A','B','B','C','C'],
'subset_close':[1,1,0,1,1,1,0]})
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
df.subset_product.value_counts()
A 3
B 2
C 2
df2 DF2
Looking to get: 希望得到:
subset_product subset_close prod_count norm
A 0 1 1/3
A 1 2 2/3
B 1 2 2/2
C 1 1 1/2
C 0 1 1/2
subset_product Besides manually calculating the normalized values as prod_count/total, is there any way to get normalized values? subset_product除了手动计算标准化值为prod_count / total之外,有没有办法获得标准化值?
I think it is not possible only one groupby
+ size
because groupby
by 2 columns subset_product
and subset_close
and need size
by subset_product
only for normalize. 我认为不可能只有一个
groupby
+ size
因为groupby
by 2 columns subset_product
和subset_close
并且需要subset_product
size
仅用于normalize。
Possible solutions are map
or transform
for Series
with same size as df2
with div
: 可能的解决方案是
map
或transform
的Series
具有相同尺寸的df2
与div
:
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
s = df.subset_product.value_counts()
df2['prod_count'] = df2['prod_count'].div(df2['subset_product'].map(s))
Or: 要么:
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
a = df2.groupby('subset_product')['prod_count'].transform('sum')
df2['prod_count'] = df2['prod_count'].div(a)
print (df2)
subset_product subset_close prod_count
0 A 0 0.333333
1 A 1 0.666667
2 B 1 1.000000
3 C 0 0.500000
4 C 1 0.500000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.