简体   繁体   English

使用Python,使我的逻辑回归测试精度更接近我的训练准确度

[英]Making my logistic regression testing accuracy closer to my training accuracy with Python

I have a basketball stats data set with 656 factors. 我有一个656因素的篮球统计数据集。 I am using a logistic regression classifier to predict winners and losers (team 1 wins or team 2 wins) by subtracting team 1 stats from team 2 stats. 我使用逻辑回归分类器通过从团队2统计数据中减去团队1统计数据来预测赢家和输家(团队1胜或团队2胜)。 Other than normalization how can I improve the accuracy of my testing set to get it closer to accuracy of training set or just improving accuracy in general? 除了标准化之外,如何提高测试集的准确性以使其更接近训练集的准确性或仅提高准确性?

I saw normalization as a possible solution, but since I am doing the difference of stats most of the values are in the same range 我认为归一化是一种可能的解决方案,但由于我正在做统计数据的差异,因此大多数值都在相同的范围内

Code: 码:

X = final_data_array[:,:656]

Y = final_data_array[:,656]

X_train, X_test, Y_train, Y_test = train_test_split(X, Y)

logistic = LogisticRegression(solver='lbfgs', max_iter=4000000, multi_class='multinomial').fit(X_train, Y_train)

print(logistic.score(X_test, Y_test))

print(logistic.score(X_train, Y_train))

0.7818791946308725

0.9069506726457399

You may try to do some feature engineering on dataset, beyond that normalize the dataset and check accuracy. 您可以尝试对数据集进行一些特征工程,除了标准化数据集和检查准确性之外。 I also recommend you to try other classification algorithms like xgbclassifier, random forest classifier etc. 我还建议你尝试其他分类算法,如xgbclassifier,随机森林分类器等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM