简体   繁体   English

线性回归问题 model

[英]Problems with linear regression model

I have this DataFrame I created, using data from basketball reference and I get the mean for each characteristic.我使用来自篮球参考的数据创建了这个 DataFrame,我得到了每个特征的平均值。 DataFrame Data No matter which column I used to train my Linear Model, my R2 score is near 0 and the predictions are awful. DataFrame 数据无论我使用哪一列来训练我的线性 Model,我的 R2 分数都接近 0,并且预测结果很糟糕。

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
import math


percent=math.floor(len(df)*0.80)
X=df['Mean MP'].to_numpy()
Y=df['BPM'].to_numpy()
Y=Y.reshape(Y.shape[0], )
X=StandardScaler().fit_transform(X)

X_train=X[:percent]
Y_train=Y[:percent]
X_test=X[percent:]
Y_test=Y[percent:]

model=linear_model.LinearRegression()
model.fit(X_train,Y_train)

a=r2_score(Y_test,model.predict(X_test))
a=-0.07

I think its how I create or use my DataFrame but I don't know how to correct the problem.我认为这是我创建或使用 DataFrame 的方式,但我不知道如何解决这个问题。 I wish you could help me, thank you.我希望你能帮助我,谢谢。

Maybe this is not the right problem for machine learning.也许这不是机器学习的正确问题。 Are you sure that there is some relation between one or more of the statistics present in your dataframe and the BPM score.您确定 dataframe 中存在的一项或多项统计数据与 BPM 分数之间存在某种关系吗? Perhaps try using a multiclass classification algorithm like decision tree, using all the features, after changing BPM to categorical scores, like between 0 and -2 is a bench player, between 4 and 6 is an all star-consideration.也许尝试使用像决策树这样的多类分类算法,使用所有特征,在将 BPM 更改为分类分数之后,比如 0 到 -2 之间是替补球员,4 到 6 之间是全明星考虑。 I know nothing about basketball, and I used this link to understand things.我对篮球一无所知,我用这个链接来了解事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM