简体   繁体   English

Python 中的线性回归不准确

[英]Linear Regression Inaccuracy in Python

I was following along with a tutorial on linear regression and machine learning in Python and decided to take it a little further by seeing how many wrong out of how many right I am getting.我正在关注 Python 中关于线性回归和机器学习的教程,并决定进一步了解一下我得到了多少正确的错误。 I found that I was getting a lot of my predictions wrong (I rounded them so even though they had many decimal places they would be marked correct).我发现我的很多预测都是错误的(我将它们四舍五入,所以即使它们有很多小数位,它们也会被标记为正确)。 Does anyone know why this is happening?有谁知道为什么会这样? Thanks a lot!非常感谢!

My code is here:我的代码在这里:

import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle

data = pd.read_csv('student-mat.csv', sep=';')

data = data[['G1', 'G2', 'G3', 'failures', 'absences', 'studytime', 'freetime', 'goout']]
predict = 'G3'

att = np.array(data.drop([predict], 1))
lab = np.array(data[predict])

att_train, att_test, lab_train, lab_test = sklearn.model_selection.train_test_split(att, lab, test_size=0.1)

linear = linear_model.LinearRegression()
linear.fit(att_train, lab_train)

acc = linear.score(att_test, lab_test)
print('Accuracy of the test: ' + str(acc) + '\n')

predictions = linear.predict(att_test)
print()

right_counter = 0
wrong_counter = 0

for b in range(len(predictions) - 1):

     print(predictions[b], att_test[b], lab_test[b])

     if round(predictions[b]) == lab_test[b]:
        print("you're right")
        right_counter += 1
     else:
        print("you're wrong")
        wrong_counter += 1

print(f'Record: {right_counter} - {wrong_counter}')

I would suggest learning about error metrics of linear regression models.我建议学习线性回归模型的误差度量。 RMSE would be a good start. RMSE 将是一个好的开始。 This would give you some intuition as to why your approach doesn't work.这会给你一些关于为什么你的方法不起作用的直觉。 In short you probably need a margin of error much larger than just the decimal points after the number.简而言之,您可能需要一个比数字后面的小数点大得多的误差范围。

Try rewriting your code to尝试将您的代码重写为

if lab_test[b] >= predictions[b] *.8 and lab_test[b] <= predictions[b] * 1.2

to give yourself a 20 percent margin of error either direction and you'll see your 'accuracy' go up.给自己一个 20% 的误差范围,你会看到你的“准确度”go 上升。

To understand why such a large margin of error is required, learn about RMSE and other error metrics, and how to minimize them.要了解为什么需要如此大的误差范围,请了解 RMSE 和其他误差指标,以及如何将它们最小化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM