简体繁体 English

如何提高我的逻辑回归 model 的准确度和精度？

[英]How to Increase accuracy and precision for my logistic regression model?

原文 2021-03-04 10:32:48 8 2 python/ machine-learning/ logistic-regression

My machine learning model dataset is cleaveland data base with 300 rows and 14 attributes--predicting whether a person has heart disease or not.. But aim is create a classification model on logistic regression... I preprocessed the data and ran the model with x_train,Y_train,X_test,Y_test.. and received avg of 82 % accuracy... My machine learning model dataset is cleaveland data base with 300 rows and 14 attributes--predicting whether a person has heart disease or not.. But aim is create a classification model on logistic regression... I preprocessed the data and ran the model with x_train,Y_train,X_test,Y_test.. 并获得 82% 的平均准确率...

So to improve the accuracy I did remove features that are highly correlated to each other [as they would give the same inforamtion]因此，为了提高准确性，我确实删除了彼此高度相关的特征[因为它们会提供相同的信息]

And I did RFE[recursive feature elimination]我做了RFE[递归特征消除]

followed by PCA[principle component analysis] for dimensionality reduction...其次是PCA[principle component analysis]用于降维...

Still I didnt find the dataset to be be better in accuracy..我仍然没有发现数据集的准确性更好..

Why is that?这是为什么？

Also why does my model shows different accuracy each time?另外为什么我的 model 每次都显示不同的精度？ is it beacuse of taking different x_train,Y_train,X_test,Y_test each time?是因为每次都采用不同的 x_train,Y_train,X_test,Y_test 吗？

Should i change my model for better accuracy?我应该更改我的 model 以获得更好的精度吗？ Is 80 % average good or bad accuracy? 80% 的平均准确率是好是坏？

2 个解决方案

Should i change my model for better accuracy?我应该更改我的 model 以获得更好的精度吗？

At least you could try to.至少你可以尝试。 The selection of the right model is highly dependend on the concrete use case.正确的 model 的选择高度依赖于具体的用例。 Trying out other approaches is never a bad idea:)尝试其他方法绝不是一个坏主意:)

Another idea would be to get the two features with the highest variance via PCA.另一个想法是通过 PCA 获得方差最大的两个特征。 Then you could plot this in 2D space to get a better feeling if your data is linearily separable.然后，如果您的数据是线性可分的，那么您可以在 2D 空间中进行 plot 以获得更好的感觉。

Also why does my model shows different accuracy each time?另外为什么我的 model 每次都显示不同的精度？

I am assuming you are using the train_test_split method of scikit-learn so split your data?我假设您使用的是 scikit-learn 的train_test_split方法，所以拆分您的数据？ By default, this method shuffels your data randomized.默认情况下，此方法随机打乱您的数据。 Your could set the random_state parameter to a fixed value to obtain reproducable results.您可以将random_state参数设置为固定值以获得可重现的结果。

see ( https://github.com/dnishimoto/python-deep-learning/blob/master/Credit%20Card%20Defaults%20-%20hyperparameter.ipynb ) to improve accuracy you do hypertuning and dimension reduction and scaling.请参阅（ https://github.com/dnishimoto/python-deep-learning/blob/master/Credit%20Card%20Defaults%20-%20hyperparameter.ipynb ）以提高您进行超调以及降维和缩放的准确性。 hypertuning is finding best parameters.超调正在寻找最佳参数。 whereas dimension reduction is removing features that don't contribute to accuracy reducing noise.而降维是删除对精度降低噪声没有贡献的特征。 scaling or normalizing reduce noise in the distribution.缩放或归一化可减少分布中的噪声。

look at GridSearch for find best parameters查看 GridSearch 以找到最佳参数