简体   繁体   中英

The columns are disarrayed in pandas crosstab


The code is to build a pd.crosstab with Titanic dataset in Seaborn. The column sums in the output table look disarrayed.

import pandas as pd
import seaborn as sns

titanic = sns.load_dataset('titanic')

bin = [0,15,100]
titanic["adult"] = pd.cut(titanic.age, bin, labels=["kid","adult"])
pd.crosstab(titanic.survived, titanic.adult, normalize=True, margins=True)

I expected to have 0.116246 / 0.883754 / 1.000000 , but it gives 0.883754 / 0.116246 / 1.000000 in the last row where the column sums should be placed.

The flipping/reversal of totals is simply due to the presence of NaN values in the original age column, and subsequently in the binned adult column you created. You should just add dropna=False to your pd.crosstab() command, which will return the right result:

pd.crosstab(titanic.survived, titanic.adult, dropna=False, normalize=True, margins=True)

adult   kid     adult       All
0   0.047619    0.546218    0.616162
1   0.068627    0.337535    0.383838
All 0.116246    0.883754    1.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM