简体   繁体   English

Pandas 数据透视表无法聚合为 2 个级别

[英]Pandas pivot table cannot aggregate in 2 levels

I am exploring titanic data on seaborn and want to do pivot table using this code:我正在探索 seaborn 上的泰坦尼克号数据,并希望使用以下代码制作数据透视表:

import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')

age = pd.cut(titanic['age'], [0, 18, 80])
# fare = pd.cut(titanic['fare'], [0, 250, 500]) #1 - this does not work
fare = pd.qcut(titanic['fare'], 3) #2 this works as intended

titanic.pivot_table('survived', ['sex', age], ['class', fare])

The problem with #1 is that does not aggregate the fare for second and third class, only for the first one. #1 的问题是不汇总二等舱和三等舱的票价,只汇总第一等舱的票价。

Results:结果:

结果

Is there anyone know why this happens?有谁知道为什么会这样?

Thank you and much appreciated!谢谢你,非常感谢!

Run this:运行这个:

pd.crosstab(titanic['class'], pd.cut(titanic['fare'],[0,10,50,74,100,200,300,500,1000]))

Output:输出:

fare    (0, 10]  (10, 50]  (50, 74]  (74, 100]  (100, 200]  (200, 300]  (500, 1000]
class                                                                              
First         1        71        42         44          33          17            3
Second        0       171         7          0           0           0            0
Third       320       153        14          0           0           0            0

Note: The highest fare for "Second" and "Third" classes is less than 75.注: “二等舱”和“三等舱”的最高票价低于75。

So, in your first example, all your Second and Third class fares are grouped in a bucket less than 250.因此,在您的第一个示例中,您的所有二等舱和三等舱票价都归入一个小于 250 的桶中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM