[英]Pandas pivot table subtotals with multi-index
I'm trying to create a simple pivot table with subtotals, excel-style, however I can't find a method using Pandas.我正在尝试创建一个简单的 pivot 表,其中包含 Excel 样式的小计,但是我找不到使用 Pandas 的方法。 I've tried the solution Wes suggested in another subtotal-related question, however that doesn't give the expected results.我已经尝试过 Wes 在另一个与小计相关的问题中建议的解决方案,但这并没有给出预期的结果。 Below the steps to reproduce it:下面是重现它的步骤:
Create the sample data:创建示例数据:
sample_data = {'customer': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'B', 'B', 'B'], 'product': ['astro','ball','car','astro','ball', 'car', 'astro', 'ball', 'car','astro','ball','car'],
'week': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'qty': [10, 15, 20, 40, 20, 34, 300, 20, 304, 23, 45, 23]}
df = pd.DataFrame(sample_data)
create the pivot table with margins (it only has total, not subtotal by customer (A, B))创建带有边距的 pivot 表(它只有总计,没有客户小计(A,B))
piv = df.pivot_table(index=['customer','product'],columns='week',values='qty',margins=True,aggfunc=np.sum)
week 1 2 All
customer product
A astro 10 300 310
ball 15 20 35
car 20 304 324
B astro 40 23 63
ball 20 45 65
car 34 23 57
All 139 715 854
Then, I tried the method Wes Mckiney mentioned in another thread, using the stack function:然后,我尝试了 Wes Mckiney 在另一个线程中提到的方法,使用堆栈 function:
piv2 = df.pivot_table(index='customer',columns=['week','product'],values='qty',margins=True,aggfunc=np.sum)
piv2.stack('product')
The result has the format I want, but the rows with the "All" doesn't have the sum:结果具有我想要的格式,但是带有“All”的行没有总和:
week 1 2 All
customer product
A NaN NaN 669.0
astro 10.0 300.0 NaN
ball 15.0 20.0 NaN
car 20.0 304.0 NaN
B NaN NaN 185.0
astro 40.0 23.0 NaN
ball 20.0 45.0 NaN
car 34.0 23.0 NaN
All NaN NaN 854.0
astro 50.0 323.0 NaN
ball 35.0 65.0 NaN
car 54.0 327.0 NaN
how to make it work as it would in Excel, sample below?如何使它像在 Excel 中一样工作,示例如下? with all the subtotals and totals working?所有小计和总计工作? what am I missing?我错过了什么? ed excel sample ed excel 样品
just to point, I am able to make it work using For loops filtering by the customer on each iteration and concat later, but I hope there might be a more direct solution thank you只是指出,我能够在每次迭代时使用客户的 For 循环过滤并稍后连接,但我希望可能有更直接的解决方案谢谢
You can do it one step, but you have to be strategic about index name due to alphabetical sorting:您可以一步完成,但由于字母排序,您必须对索引名称保持战略性:
piv = df.pivot_table(index=['customer','product'],
columns='week',
values='qty',
margins=True,
margins_name='Total',
aggfunc=np.sum)
(pd.concat([piv,
piv.query('customer != "Total"')
.sum(level=0)
.assign(product='total')
.set_index('product', append=True)])
.sort_index())
Output: Output:
week 1 2 Total
customer product
A astro 10 300 310
ball 15 20 35
car 20 304 324
total 45 624 669
B astro 40 23 63
ball 20 45 65
car 34 23 57
total 94 91 185
Total 139 715 854
@Scott Boston's answer is perfect and elegant. @Scott Boston 的回答完美而优雅。 For reference, if you group just the customers and pd.concat()
the results are We get the following results.作为参考,如果你只对客户和pd.concat()
进行分组,结果是我们得到以下结果。
piv = df.pivot_table(index=['customer','product'],columns='week',values='qty',margins=True,aggfunc=np.sum)
piv3 = df.pivot_table(index=['customer'],columns='week',values='qty',margins=True,aggfunc=np.sum)
piv4 = pd.concat([piv, piv3], axis=0)
piv4
week 1 2 All
(A, astro) 10 300 310
(A, ball) 15 20 35
(A, car) 20 304 324
(B, astro) 40 23 63
(B, ball) 20 45 65
(B, car) 34 23 57
(All, ) 139 715 854
A 45 624 669
B 94 91 185
All 139 715 854
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.