简体   繁体   English

熊猫数据透视表中的行/列的百分比

[英]Percentage on rows/columns in pivot table in pandas

Let's say I've dataframe df : 假设我有dataframe df

df.head()
                     M1       M2      M3       M4
Timestamp                                                       
2018-09-20 12:59:57  cat 1    obj_1   name_1   1
2018-09-20 12:58:53  cat 1    obj_2   name_2   1
2018-09-20 12:57:44  else 1   obj_3   name_1   1
2018-09-20 12:57:19  cat 11   obj_2   name_1   1
2018-09-20 12:56:17  cat 11   obj_2   name_1   1

With this df I'm preparing a set of pivot tables for each column presenting both percentage (%) of occurrences as well as it's count (N): 借助此df我为每列准备了一组数据透视表,以显示出现率(%)以及计数(N):

df[['M1']].pivot_table(index=df.index.date, aggfunc=(
    ('%', lambda x: len(x) / df['M1'].count()), 
    ('N', 'size')))

When I come across to preparing pivot table on two series I'd like to display the percentage of occurrences of M1 not in the whole dataframe but in relation to M2 categories. 当我遇到准备两个系列的数据透视表时,我想显示的不是M1出现在整个数据框中的百分比,而是相对于M2类别。 So far I've tried to set the denominator to M2 count, but it's the overall count and not the count of M1 within specific M2 categories: 到目前为止,我尝试将分母设置为M2计数,但这是总计数,而不是特定M2类别中的M1计数:

df[['M1', 'M2']].pivot_table(columns='M2', index='M1', aggfunc=(lambda x: len(x) / df['M2'].count()))

Any clues how to get specific percentage of M1 in each M2 category? 有什么线索如何获得每个M2类别中M1特定百分比? Expected output: 预期产量:

M2       obj_1    obj_2    obj_3
M1
cat 1    value1   value*   value*
cat 2    value*   value*   value*
...      ...      ...      ...
cat 11   value*   value*   value*
else 1   value*   value*   value*

where value1 is number of occurrences of cat 1 within all occurrences of obj_1 etc. 其中value1是在所有出现的obj_1等中cat 1出现obj_1

You can do a groupby to find number of M2 's for each category, and add it as a column to your dataframe as follows 您可以执行groupby查找每个类别的M2数量,并将其作为列添加到数据框中,如下所示

df['count_M2'] = df.groupby('M2')['M1'].transform('count')

Then you run the pivot_table function to get the percentage of M1 's in each M2 group 然后,运行pivot_table函数以获取每个M2组中M1的百分比

df.pivot_table(values=['count_M2'], index=['M1'], columns=['M2'], 
               aggfunc=lambda x: len(x) / x.iloc[0])

Details 细节

df df

                  Time      M1     M2      M3  M4  count_M2
0  2018-09-20 12:59:57   cat 1  obj_1  name_1   1         1
1  2018-09-20 12:58:53   cat 1  obj_2  name_2   1         3
2  2018-09-20 12:57:44  else 1  obj_3  name_1   1         1
3  2018-09-20 12:57:19  cat 11  obj_2  name_1   1         3
4  2018-09-20 12:56:17  cat 11  obj_2  name_1   1         3

df.pivot_table df.pivot_table

       count_M2                
M2        obj_1     obj_2 obj_3
M1                             
cat 1       1.0  0.333333   NaN
cat 11      NaN  0.666667   NaN
else 1      NaN       NaN   1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM