[英]Collinear features and their effect on linear models,Task: 1 Logistic Regression
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV
import seaborn as sns
import matplotlib.pyplot as plt`enter code here`
data = pd.read_csv('task_d.csv')
data.head()
output output
x y z x*x 2*y 2*z+3*x*x w target
0 -0.581066 0.841837 -1.012978 -0.604025 0.841837 -0.665927 -0.536277 0
1 -0.894309 -0.207835 -1.012978 -0.883052 -0.207835 -0.917054 -0.522364 0
2 -1.207552 0.212034 -1.082312 -1.150918 0.212034 -1.166507 0.205738 0
3 -1.364174 0.002099 -0.943643 -1.280666 0.002099 -1.266540 -0.665720 0
4 -0.737687 1.051772 -1.012978 -0.744934 1.051772 -0.792746 -0.735054 0
X = data.drop(['target'], axis=1).values
Y = data['target'].values
Doing perturbation test to check the presence of collinearity Task: 1 Logistic Regression¶进行扰动测试以检查共线性的存在任务:1 逻辑回归¶
data.corr()['target']
output output
x 0.728290
y -0.690684
z 0.969990
x*x 0.719570
2*y -0.690684
2*z+3*x*x 0.764729
w 0.641750
target 1.000000
Name: target, dtype: float64
corr = X.corr()
ax = sns.heatmap(corr,vmin=-1, vmax=1, center=0,cmap=sns.diverging_palette(20, 220, n=200),square=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right');
output output
AttributeError Traceback (most recent call last)
<ipython-input-42-749cdea8ad1a> in <module>
1 ##correlation matrix using seaborn heatmap##https://towardsdatascience.com/better-heatmapscorr = X.corr()
----> 2 corr = X.corr()
3 ax = sns.heatmap(corr,vmin=-1, vmax=1, center=0,cmap=sns.diverging_palette(20, 220, n=200),square=True)
4 ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right');
AttributeError: 'numpy.ndarray' object has no attribute 'corr'
How can I fix this?我怎样才能解决这个问题?
Why did you use.values() when creating X?为什么在创建 X 时使用.values()? That returns a numpy array.这将返回一个 numpy 数组。
If you remove the.values(), your X will remain a pandas DataFrame, which has the.corr() method.如果您删除 .values(),您的 X 将保留为 pandas DataFrame,它具有 .corr() 方法。 Then your code will run as you intended.然后您的代码将按您的预期运行。
corr = X.corr() ax = sns.heatmap(corr,vmin=-1, vmax=1, center=0,cmap=sns.diverging_palette(20, 220, n=200),square=True) ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right'); corr = X.corr() ax = sns.heatmap(corr,vmin=-1, vmax=1, center=0,cmap=sns.diverging_palette(20, 220, n=200),square=True) ax.set_xticklabels (ax.get_xticklabels(),rotation=45,horizontalalignment='right');
instead of X call directly to the dataset而不是 X 直接调用数据集
this will help corr = data.corr() ax = sns.heatmap(corr,vmin=-1, vmax=1, center=0,cmap=sns.diverging_palette(20, 220, n=200),square=True) ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right');这将有助于 corr = data.corr() ax = sns.heatmap(corr,vmin=-1, vmax=1, center=0,cmap=sns.diverging_palette(20, 220, n=200),square=True) ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right');
#python #machinelearning #python #机器学习
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.