The columns are disarrayed in pandas crosstab

Question

jupyter笔记本图像

The code is to build a pd.crosstab with Titanic dataset in Seaborn. The column sums in the output table look disarrayed.

import pandas as pd
import seaborn as sns

titanic = sns.load_dataset('titanic')

bin = [0,15,100]
titanic["adult"] = pd.cut(titanic.age, bin, labels=["kid","adult"])
pd.crosstab(titanic.survived, titanic.adult, normalize=True, margins=True)

I expected to have 0.116246 / 0.883754 / 1.000000 , but it gives 0.883754 / 0.116246 / 1.000000 in the last row where the column sums should be placed.

Answer 1

The flipping/reversal of totals is simply due to the presence of NaN values in the original age column, and subsequently in the binned adult column you created. You should just add dropna=False to your pd.crosstab() command, which will return the right result:

pd.crosstab(titanic.survived, titanic.adult, dropna=False, normalize=True, margins=True)

adult   kid     adult       All
survived            
0   0.047619    0.546218    0.616162
1   0.068627    0.337535    0.383838
All 0.116246    0.883754    1.000000

The columns are disarrayed in pandas crosstab

Question

1 answers

solution1
0 ACCPTED 2019-04-08 04:15:21

The columns are disarrayed in pandas crosstab

Question

1 answers

solution1 0 ACCPTED 2019-04-08 04:15:21

solution1
0 ACCPTED 2019-04-08 04:15:21