[英]I want to calculate the percentage but all i am getting is the sum in pandas data frame
I want to calculate the percentage but all i am getting is the sum .我想计算百分比,但我得到的只是总和。 Please help me get the percentage value in the cells rather than the count in python in pandas data frame .
请帮助我获取单元格中的百分比值,而不是 Pandas 数据框中 python 中的计数。
Code :代码 :
ds_data = data[(data.JobTitle == 'Data Analyst') | (data.JobTitle == 'Data Engineer') | (data.JobTitle == 'Data Scientist')]
agg_func = {'Education':{'Masters': lambda x: \
sum(i == 'Masters' for i in x),
'Bachelor': lambda x : sum(i == 'Bachelors (4 years)' for i in x),
'None': lambda x : sum(i == 'None (no degree completed)' for i in x),
'Doctorates': lambda x : sum(i == 'Doctorate/PhD' for i in x),
'Associates': lambda x : sum(i == 'Associates (2 years)' for i in x)}}
function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
function.columns = function.columns.droplevel(0)
function
I've taken the liberty to define a function to contain the math, since it is cleaner than copy/pasting the code.我冒昧地定义了一个包含数学的函数,因为它比复制/粘贴代码更清晰。
In order to get the percentage, you need to divide by the total number, or the length of the list.为了得到百分比,你需要除以总数,或者列表的长度。
def calc_percentage(data, degree):
return (sum(i == degree for i in x) / len(x)) * 100
agg_func = {
'Education': {
'Masters': lambda x : calc_percentage(x, 'Masters'),
'Bachelor': lambda x : calc_percentage(x, 'Bachelors (4 years)'),
'None': lambda x : calc_percentage(x, 'None (no degree completed)'),
'Doctorates': lambda x : calc_percentage(x, 'Doctorate/PhD'),
'Associates': lambda x : calc_percentage(x, 'Associates (2 years)')
}
}
If we use the dict renaming (which is deprecated), one can compute the total amount of rows, and then using it in the lambda functions to get the percentage:如果我们使用 dict 重命名(已弃用),则可以计算总行数,然后在 lambda 函数中使用它来获取百分比:
ds_data = data[(data.JobTitle == 'Data Analyst') | (data.JobTitle == 'Data Engineer')
| (data.JobTitle == 'Data Scientist')]
ds_data_nrows = ds_data.shape[0]
agg_func = {'Education':{'Masters': lambda x: \
(sum(i == 'Masters' for i in x) / ds_data_nrows) * 100,
'Bachelor': lambda x : (sum(i == 'Bachelors (4 years)' for i in x) / ds_data_nrows) * 100,
'None': lambda x : (sum(i == 'None (no degree completed)' for i in x) / ds_data_nrows) * 100,
'Doctorates': lambda x : (sum(i == 'Doctorate/PhD' for i in x) / ds_data_nrows) * 100,
'Associates': lambda x : (sum(i == 'Associates (2 years)' for i in x) / ds_data_nrows) * 100}}
function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
function.columns = function.columns.droplevel(0)
function
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.