[英]How to calculate count and percentage in groupby in Python
I have following output after grouping by分组后我有以下输出
Publisher.groupby('Category')['Title'].count()
Category
Coding 5
Hacking 7
Java 1
JavaScript 5
LEGO 43
Linux 7
Networking 5
Others 123
Python 8
R 2
Ruby 4
Scripting 4
Statistics 2
Web 3
In the above output I want the percentage also ie for the first row 5*100/219
and so on.在上面的输出中,我也想要百分比,即第一行5*100/219
等等。 I am doing following我正在做以下
Publisher.groupby('Category')['Title'].agg({'Count':'count','Percentage':lambda x:x/x.sum()})
But it gives me an error.但它给了我一个错误。 Please help请帮忙
I think you can use:我认为你可以使用:
P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title'] / P['Title'].sum()
Sample:样品:
Publisher = pd.DataFrame({'Category':['a','a','s'],
'Title':[4,5,6]})
print (Publisher)
Category Title
0 a 4
1 a 5
2 s 6
P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title'] / P['Title'].sum()
print (P)
Category Title Percentage
0 a 2 66.666667
1 s 1 33.333333
df = pd.DataFrame({'Category':['a','a','s'],
'Title':[4,5,6]})
df=df.groupby('Category')['Title'].count().rename("percentage").transform(lambda x: x/x.sum())
df.reset_index()
#output in dataframe type
Category percentage
0 a 0.666667
1 s 0.333333
#please let me know if it doesn't solve your current problem
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.