[英]Looking for a way to produce a table of statistics from columns in a data frame
I have a data set with categories/codes eg male/female, state of service, code of service and I have a column of paid claims. 我有一个包含类别/代码的数据集,例如,男性/女性,服务状态,服务代码,并且有一列已付款的索赔。
I am looking for a way to create a table/pivot using Python to generate outputs where I only have the top 10 highest column of average paid claims by code of service (ie what are the top 10 codes with highest average paid claims). 我正在寻找一种使用Python创建表/数据透视表的方式来生成输出,其中我仅具有按服务代码分类的平均已付费索赔的前10名最高列(即,具有最高平均已付费索赔的前10个代码是什么)。 I also wanted to append with median, stdev, counts so the output looks something like 我还想附加中位数,stdev和计数,以便输出看起来像
Table: 表:
gender, code, state, paid claim
F, 1234, TX, $300
F, 2345, NJ, $120
F, 3456, NJ, $30
M, 1234, MN, $250
M, 4567, CA, $50
F, 1234, MA, $70
F, 8901, CA, $150
F, 23457, NY, $160
F, 4567, SD, $125
Output I am trying to generate (top 10 ave paid claim by code): 我正在尝试生成的输出(按代码排在前10位的已付费索赔):
code, average claim, median claim, count claim
1234, 206, xxx, 3
So, I did something like: 因此,我做了类似的事情:
service_code_average=df.groupby('service_code', as_index=False)['paid claim'].mean().sort_values(by='paid claim')
I was not able to limit to top 10 and I was struggling to append the media and count. 我无法将排名限制在前10位,而且我还在努力增加媒体的数量。
Here you can leverage agg
function where you can specify multiple aggregation function in one go. 在这里,您可以利用agg
函数,在其中可以一次性指定多个聚合函数。 You can do the following: 您可以执行以下操作:
# convert string to integer
df['paid claim'] = df['paid claim'].str.extract('(\d+)')
df['paid claim'] = df['paid claim'].astype(int)
# set n value
top_n = 2 ## set this to 10
# apply aggregation
df1 = df.groupby('code').agg({'paid claim':{'average': lambda x: x.nlargest(top_n).mean(),
'counts': lambda x: x.count(),
'median': lambda x: x.median()}})
# reset column names
df1.columns = df1.columns.droplevel()
df1 = df1.reset_index()
print(df1)
code average counts median
0 1234 275.0 3 250.0
1 2345 120.0 1 120.0
2 3456 30.0 1 30.0
3 4567 87.5 2 87.5
4 8901 150.0 1 150.0
5 23457 160.0 1 160.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.