简体   繁体   English

我想计算百分比,但我得到的只是熊猫数据框中的总和

[英]I want to calculate the percentage but all i am getting is the sum in pandas data frame

I want to calculate the percentage but all i am getting is the sum .我想计算百分比,但我得到的只是总和。 Please help me get the percentage value in the cells rather than the count in python in pandas data frame .请帮助我获取单元格中的百分比值,而不是 Pandas 数据框中 python 中的计数。

Code :代码 :

ds_data = data[(data.JobTitle == 'Data Analyst') | (data.JobTitle == 'Data Engineer')  | (data.JobTitle == 'Data Scientist')]
agg_func = {'Education':{'Masters': lambda x: \
    sum(i == 'Masters' for i in x),
    'Bachelor': lambda x : sum(i == 'Bachelors (4 years)' for i in x),
    'None': lambda x : sum(i == 'None (no degree completed)' for i in x),
    'Doctorates': lambda x : sum(i == 'Doctorate/PhD' for i in x),
    'Associates': lambda x : sum(i == 'Associates (2 years)' for i in x)}}
function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
function.columns = function.columns.droplevel(0)
function

I've taken the liberty to define a function to contain the math, since it is cleaner than copy/pasting the code.我冒昧地定义了一个包含数学的函数,因为它比复制/粘贴代码更清晰。

In order to get the percentage, you need to divide by the total number, or the length of the list.为了得到百分比,你需要除以总数,或者列表的长度。

def calc_percentage(data, degree):
  return (sum(i == degree for i in x) / len(x)) * 100

agg_func = {
    'Education': {
        'Masters': lambda x : calc_percentage(x, 'Masters'),
        'Bachelor': lambda x : calc_percentage(x, 'Bachelors (4 years)'),
        'None': lambda x : calc_percentage(x, 'None (no degree completed)'),
        'Doctorates': lambda x : calc_percentage(x, 'Doctorate/PhD'),
        'Associates': lambda x : calc_percentage(x, 'Associates (2 years)')
    }
}

If we use the dict renaming (which is deprecated), one can compute the total amount of rows, and then using it in the lambda functions to get the percentage:如果我们使用 dict 重命名(已弃用),则可以计算总行数,然后在 lambda 函数中使用它来获取百分比:

ds_data = data[(data.JobTitle == 'Data Analyst') | (data.JobTitle == 'Data Engineer') 
               | (data.JobTitle == 'Data Scientist')]
ds_data_nrows = ds_data.shape[0]
agg_func = {'Education':{'Masters': lambda x: \
    (sum(i == 'Masters' for i in x) / ds_data_nrows) * 100,
    'Bachelor': lambda x : (sum(i == 'Bachelors (4 years)' for i in x) / ds_data_nrows) * 100,
    'None': lambda x : (sum(i == 'None (no degree completed)' for i in x) / ds_data_nrows) * 100,
    'Doctorates': lambda x : (sum(i == 'Doctorate/PhD' for i in x) / ds_data_nrows) * 100,
    'Associates': lambda x : (sum(i == 'Associates (2 years)' for i in x) / ds_data_nrows) * 100}}
function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
function.columns = function.columns.droplevel(0)
function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM