[英]Total values in a pivot table in python
My original dataframe is similar to the one below:我原来的 dataframe 与下面的类似:
df= pd.DataFrame({'Variation' : ['A']*5 + ['B']*3 + ['A']*4,
'id': [11]*4 + [12] + [15]*2 + [17] + [20]*4,
'steps' : ['start','step1','step2','end','end','step1','step2','step1','start','step1','step2','end']})
I wanted to create a pivot table from this dataframe for which I have used the below mentioned code:我想从这个 dataframe 创建一个 pivot 表,我使用了下面提到的代码:
df1=df.pivot_table(index=['Variation'], columns=['steps'],
values='id', aggfunc='count', fill_value=0)
However, I also wanted to look at the total distinct count of the id's as well.但是,我还想查看 id 的总不同计数。 Can someone please let me know how to achieve this?
有人可以让我知道如何实现这一目标吗? My expected output should be:
我预期的 output 应该是:
| Variation | Total id | Total start | Total step1 | Total step2 | Total end |
|-----------|----------|-------------|-------------|-------------|-----------|
| A | 3 | 2 | 2 | 2 | 3 |
| B | 2 | 0 | 2 | 1 | 0 |
Use SeriesGroupBy.nunique
:使用
SeriesGroupBy.nunique
:
df1 = df1.join(df.groupby('Variation')['id'].nunique().rename('Total id'))
print(df1)
end start step1 step2 Total id
Variation
A 3 2 2 2 3
B 0 0 2 1 2
If need column after Variation
:如果需要
Variation
之后的列:
c = ['id'] + df['steps'].unique().tolist()
df1 = (df1.join(df.groupby('Variation')['id'].nunique())
.reindex(columns=c)
.add_prefix('Total ')
.reset_index()
.rename_axis(None, axis=1))
print(df1)
Variation Total id Total start Total step1 Total step2 Total end
0 A 3 2 2 2 3
1 B 2 0 2 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.