[英]Percentage on rows/columns in pivot table in pandas
Let's say I've dataframe df
: 假设我有dataframe
df
:
df.head()
M1 M2 M3 M4
Timestamp
2018-09-20 12:59:57 cat 1 obj_1 name_1 1
2018-09-20 12:58:53 cat 1 obj_2 name_2 1
2018-09-20 12:57:44 else 1 obj_3 name_1 1
2018-09-20 12:57:19 cat 11 obj_2 name_1 1
2018-09-20 12:56:17 cat 11 obj_2 name_1 1
With this df
I'm preparing a set of pivot tables for each column presenting both percentage (%) of occurrences as well as it's count (N): 借助此
df
我为每列准备了一组数据透视表,以显示出现率(%)以及计数(N):
df[['M1']].pivot_table(index=df.index.date, aggfunc=(
('%', lambda x: len(x) / df['M1'].count()),
('N', 'size')))
When I come across to preparing pivot table on two series I'd like to display the percentage of occurrences of M1
not in the whole dataframe but in relation to M2 categories. 当我遇到准备两个系列的数据透视表时,我想显示的不是
M1
出现在整个数据框中的百分比,而是相对于M2类别。 So far I've tried to set the denominator to M2
count, but it's the overall count and not the count of M1
within specific M2
categories: 到目前为止,我尝试将分母设置为
M2
计数,但这是总计数,而不是特定M2
类别中的M1
计数:
df[['M1', 'M2']].pivot_table(columns='M2', index='M1', aggfunc=(lambda x: len(x) / df['M2'].count()))
Any clues how to get specific percentage of M1
in each M2
category? 有什么线索如何获得每个
M2
类别中M1
特定百分比? Expected output: 预期产量:
M2 obj_1 obj_2 obj_3
M1
cat 1 value1 value* value*
cat 2 value* value* value*
... ... ... ...
cat 11 value* value* value*
else 1 value* value* value*
where value1
is number of occurrences of cat 1
within all occurrences of obj_1
etc. 其中
value1
是在所有出现的obj_1
等中cat 1
出现obj_1
。
You can do a groupby
to find number of M2
's for each category, and add it as a column to your dataframe as follows 您可以执行
groupby
查找每个类别的M2
数量,并将其作为列添加到数据框中,如下所示
df['count_M2'] = df.groupby('M2')['M1'].transform('count')
Then you run the pivot_table
function to get the percentage of M1
's in each M2
group 然后,运行
pivot_table
函数以获取每个M2
组中M1
的百分比
df.pivot_table(values=['count_M2'], index=['M1'], columns=['M2'],
aggfunc=lambda x: len(x) / x.iloc[0])
Details 细节
df df
Time M1 M2 M3 M4 count_M2
0 2018-09-20 12:59:57 cat 1 obj_1 name_1 1 1
1 2018-09-20 12:58:53 cat 1 obj_2 name_2 1 3
2 2018-09-20 12:57:44 else 1 obj_3 name_1 1 1
3 2018-09-20 12:57:19 cat 11 obj_2 name_1 1 3
4 2018-09-20 12:56:17 cat 11 obj_2 name_1 1 3
df.pivot_table df.pivot_table
count_M2
M2 obj_1 obj_2 obj_3
M1
cat 1 1.0 0.333333 NaN
cat 11 NaN 0.666667 NaN
else 1 NaN NaN 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.