[英]Why am I getting 100% accuracy for my logistic regression model?
Import the libraries导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn import preprocessing
import seaborn as sns
%matplotlib inline
Reading the data读取数据
df =pd.read_csv('./EngineeredData_2.csv')
df =df.dropna()
Split the data into x and y:将数据拆分为 x 和 y:
X= df.drop (['Week','Div', 'Date', 'HomeTeam', 'AwayTeam','HTHG', 'HTAG','HTR',
'FTAG', 'FTHG','HGKPP', 'AGKPP', 'FTR'], axis =1)
Trarnsoforming y into integers:将 y 变换为整数:
L = preprocessing.LabelEncoder ()
matchresults = L.fit_transform (list (df['FTR']))
y =list(matchresults)
Split the data into train and test:将数据拆分为训练和测试:
from sklearn.model_selection import train_test_split
X_tng,X_tst, y_tng, y_tst =train_test_split (X, y, test_size = 50, shuffle=False)
X_tng.head()
import the class导入类
from sklearn.linear_model import LogisticRegression
Instantiate the model实例化模型
logreg = LogisticRegression ()
Fit the model with the data用数据拟合模型
logreg.fit (X_tng, y_tng)
Predict the test data y_pred = logreg.predict (X_tst)预测测试数据 y_pred = logreg.predict(X_tst)
acc = logreg. score (X_tst, y_tst)
print (acc)
Does the accuracy make sense to be 100%?准确率达到 100% 有意义吗?
The problem is that you unintentionally dropped all of your features and only retained your target value in x
.问题是您无意中删除了所有功能,只保留了
x
中的目标值。 So, you are attempting to explain the target value with the target value itself, which of course will give you 100% accuracy.因此,您试图用目标值本身来解释目标值,这当然会给您 100% 的准确性。 You defined your features columns as:
您将功能列定义为:
X= df.drop (['Week','Div', 'Date', 'HomeTeam', 'AwayTeam','HTHG', 'HTAG','HTR',
'FTAG', 'FTHG','HGKPP', 'AGKPP', 'FTR'], axis =1)
But you should have defined them as:但是您应该将它们定义为:
X= df.drop('FTR', axis =1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.