简体   繁体   English

相关矩阵不显示所有列python

[英]Correlation matrix does not show all columns python

I am trying to solve the "House Prices" challenge from Kaggle and I'm stuck on my correlation matrix because it simply doesn't show all columns I want.我正在尝试解决来自 Kaggle 的“房价”挑战,但我被困在我的相关矩阵上,因为它根本没有显示我想要的所有列。 Initially, it was obviously because of the large number of columns, so I did this:一开始,很明显是因为列数多,所以我是这样做的:

df = df_data[['SalePrice', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities']].copy()    

corrmax = df.corr()

f, ax = plt.subplots(figsize=(16,12))
sns.heatmap(corrmax, annot = True)

And then, the result is a heatmap with only SalePrice, MSSubClass, LotFrontage and LotArea for some reason.然后,由于某种原因,结果是一个只有 SalePrice、MSSubClass、LotFrontage 和 LotArea 的热图。 Can anyone please help me?谁能帮帮我吗?

If you analysis the dataset of House Prices House Prices there are about 21-23 categorical variables 'MSZoning','Alley' The corr() matrix only show their relationship between the numerical values or non-categorical variables如果分析 House Prices House Prices的数据集,大约有 21-23 个分类变量 'MSZoning','Alley' corr() 矩阵只显示它们之间的数值或非分类变量之间的关系

corrmax = df.corr()

If you want to find the relation between the categorical and non-categorical variables use need to use the Spearman correlation matrix如果要查找分类变量和非分类变量之间的关系,需要使用Spearman 相关矩阵

You will find some help from the links below...您可以从以下链接中找到一些帮助...

An overview of correlation measures between categorical and continuous variables 分类变量和连续变量之间的相关度量概述

Correlation between a nominal (IV) and a continuous (DV) variable 标称 (IV) 和连续 (DV) 变量之间的相关性

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM