简体   繁体   English

按中位数、百分位数和总数百分比分组

[英]Group By Median, Percentile and Percent of Total

I have a dataframe that looks like this...我有一个看起来像这样的 dataframe...

 ID Acuity TOTAL_ED_LOS
 1    2      423
 2    5      52
 3    5      535
 4    1      87
 ...

I would like to produce a table that looks like this:我想制作一个如下所示的表格:

 Acuity    Count   Median Percentile_25   Percentile_75   % of total
   1         234 ...                                         31%                                         
   2         65 ...                                           8%
   3         56 ...                                           7%
   4         345 ...                                          47%
   5         35  ...                                          5%

I already have code that will give me everything I need except for the % of total column我已经有代码可以提供我需要的一切,除了总列的百分比

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

df_grp = df_merged_v1.groupby(['Acuity'])
df_grp['TOTAL_ED_LOS'].agg(['count','median', 
                                  percentile(25), percentile(75)]).reset_index()

Is there an efficient way I can add the percent of total column?有没有一种有效的方法可以添加总列的百分比? The link below contain code on how to obtain the percent of total but I'm unsure how to apply it to my code.下面的链接包含有关如何获得总数百分比的代码,但我不确定如何将其应用于我的代码。 I know that I could create two tables and then merge them but am curious if there is a cleaner way.我知道我可以创建两个表然后合并它们,但我很好奇是否有更清洁的方法。

How to calculate count and percentage in groupby in Python 如何在 Python 中计算 groupby 中的计数和百分比

Here's a one way to do it using some pandas builtin tools:这是使用一些 pandas 内置工具的一种方法:

# Set random number seeed and create a dummy datafame with two columns
np.random.seed(123)
df = pd.DataFrame({'activity':np.random.choice([*'ABCDE'], 40), 
                   'TOTAL_ED_LDS':np.random.randint(50, 500, 40)})

# Reshape dataframe to get activit per column 
# then use the output from describe and transpose
df_out = df.set_index([df.groupby('activity').cumcount(),'activity'])['TOTAL_ED_LDS']\
           .unstack().describe().T

#Calculate percent count of total count
df_out['% of Total'] = df_out['count'] / df_out['count'].sum() * 100.
df_out

Output: Output:

          count        mean         std    min     25%    50%     75%    max  % of Total
activity                                                                                
A           8.0  213.125000  106.810162   93.0  159.50  200.0  231.75  421.0        20.0
B          10.0  308.200000  116.105125   68.0  240.75  324.5  376.25  461.0        25.0
C           6.0  277.666667  117.188168  114.0  193.25  311.5  352.50  409.0        15.0
D           7.0  370.285714  124.724649  120.0  337.50  407.0  456.00  478.0        17.5
E           9.0  297.000000  160.812002   51.0  233.00  294.0  415.00  488.0        22.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM