列在熊猫交叉表中混乱

Question

jupyter笔记本图像

The code is to build a pd.crosstab with Titanic dataset in Seaborn. 该代码将使用Seaborn中的Titanic数据集构建pd.crosstab。 The column sums in the output table look disarrayed. 输出表中的列总和看起来很混乱。

import pandas as pd
import seaborn as sns

titanic = sns.load_dataset('titanic')

bin = [0,15,100]
titanic["adult"] = pd.cut(titanic.age, bin, labels=["kid","adult"])
pd.crosstab(titanic.survived, titanic.adult, normalize=True, margins=True)

I expected to have 0.116246 / 0.883754 / 1.000000 , but it gives 0.883754 / 0.116246 / 1.000000 in the last row where the column sums should be placed. 我预计将有0.116246 / 0.883754 / 1.000000 ，但是在最后一行应放置列总和的位置给出0.883754 / 0.116246 / 1.000000 。

Answer 1

The flipping/reversal of totals is simply due to the presence of NaN values in the original age column, and subsequently in the binned adult column you created. 总计的翻转/冲销完全是由于原始age列中存在NaN值，随后您创建的合并adult列中也存在NaN值。 You should just add dropna=False to your pd.crosstab() command, which will return the right result: 您应该只将dropna=False添加到您的pd.crosstab()命令中，这将返回正确的结果：

pd.crosstab(titanic.survived, titanic.adult, dropna=False, normalize=True, margins=True)

adult   kid     adult       All
survived            
0   0.047619    0.546218    0.616162
1   0.068627    0.337535    0.383838
All 0.116246    0.883754    1.000000

列在熊猫交叉表中混乱

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-04-08 04:15:21

列在熊猫交叉表中混乱

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-04-08 04:15:21

解决方案1
0 已采纳 2019-04-08 04:15:21