[英]The columns are disarrayed in pandas crosstab
The code is to build a pd.crosstab with Titanic dataset in Seaborn. 该代码将使用Seaborn中的Titanic数据集构建pd.crosstab。 The column sums in the output table look disarrayed. 输出表中的列总和看起来很混乱。
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
bin = [0,15,100]
titanic["adult"] = pd.cut(titanic.age, bin, labels=["kid","adult"])
pd.crosstab(titanic.survived, titanic.adult, normalize=True, margins=True)
I expected to have 0.116246 / 0.883754 / 1.000000
, but it gives 0.883754 / 0.116246 / 1.000000
in the last row where the column sums should be placed. 我预计将有0.116246 / 0.883754 / 1.000000
,但是在最后一行应放置列总和的位置给出0.883754 / 0.116246 / 1.000000
。
The flipping/reversal of totals is simply due to the presence of NaN values in the original age
column, and subsequently in the binned adult
column you created. 总计的翻转/冲销完全是由于原始age
列中存在NaN值,随后您创建的合并adult
列中也存在NaN值。 You should just add dropna=False
to your pd.crosstab()
command, which will return the right result: 您应该只将dropna=False
添加到您的pd.crosstab()
命令中,这将返回正确的结果:
pd.crosstab(titanic.survived, titanic.adult, dropna=False, normalize=True, margins=True)
adult kid adult All
survived
0 0.047619 0.546218 0.616162
1 0.068627 0.337535 0.383838
All 0.116246 0.883754 1.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.