[英]Counting categories within subsets of years and divide by total count within the subset
I am counting the number of negative numbers and positive numbers within each year.我正在计算每年负数和正数的数量。 Ultimately I want to get the percent of negative and positive for each year.最终,我想获得每年负面和正面的百分比。
I tried groupby year and counting the categories, but the new columns appears with no name.我尝试按年份分组并计算类别,但新列出现时没有名称。
df1= df.groupby(['Year','Count of Negative/Positive Margins'])['Count of Negative/Positive Margins'].count()
df1.head()
Out[194]:
Year Count of Negative/Positive Margins
2005 1 4001
2 1373
2006 1 4046
2 1304
2007 1 4156
Name: Count of Negative/Positive Margins, dtype: int64
This my expected output:这是我的预期输出:
2005 1 74%
2 26%
.
.
.
Use SeriesGroupBy.value_counts
with grouping only column Year
and parameter normalize=True
, then multiple by 100
, round by Series.round
, convert to strings and add %
:使用SeriesGroupBy.value_counts
仅对列Year
和参数normalize=True
进行分组,然后乘以100
,按Series.round
舍Series.round
,转换为字符串并添加%
:
df = (df.groupby('Year')['Count of Negative/Positive Margins']
.value_counts(normalize=True)
.mul(100)
.round()
.astype(str)
.add('%')
.reset_index(name='percentage')
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.