[英]How to Increase accuracy and precision for my logistic regression model?
My machine learning model dataset is cleaveland data base with 300 rows and 14 attributes--predicting whether a person has heart disease or not.. But aim is create a classification model on logistic regression... I preprocessed the data and ran the model with x_train,Y_train,X_test,Y_test.. and received avg of 82 % accuracy... My machine learning model dataset is cleaveland data base with 300 rows and 14 attributes--predicting whether a person has heart disease or not.. But aim is create a classification model on logistic regression... I preprocessed the data and ran the model with x_train,Y_train,X_test,Y_test.. 并获得 82% 的平均准确率...
So to improve the accuracy I did remove features that are highly correlated to each other [as they would give the same inforamtion]因此,为了提高准确性,我确实删除了彼此高度相关的特征[因为它们会提供相同的信息]
And I did RFE[recursive feature elimination]我做了RFE[递归特征消除]
followed by PCA[principle component analysis] for dimensionality reduction...其次是PCA[principle component analysis]用于降维...
Still I didnt find the dataset to be be better in accuracy..我仍然没有发现数据集的准确性更好..
Why is that?这是为什么?
Also why does my model shows different accuracy each time?另外为什么我的 model 每次都显示不同的精度? is it beacuse of taking different x_train,Y_train,X_test,Y_test each time?
是因为每次都采用不同的 x_train,Y_train,X_test,Y_test 吗?
Should i change my model for better accuracy?我应该更改我的 model 以获得更好的精度吗? Is 80 % average good or bad accuracy?
80% 的平均准确率是好是坏?
Should i change my model for better accuracy?
我应该更改我的 model 以获得更好的精度吗?
At least you could try to.至少你可以尝试。 The selection of the right model is highly dependend on the concrete use case.
正确的 model 的选择高度依赖于具体的用例。 Trying out other approaches is never a bad idea:)
尝试其他方法绝不是一个坏主意:)
Another idea would be to get the two features with the highest variance via PCA.另一个想法是通过 PCA 获得方差最大的两个特征。 Then you could plot this in 2D space to get a better feeling if your data is linearily separable.
然后,如果您的数据是线性可分的,那么您可以在 2D 空间中进行 plot 以获得更好的感觉。
Also why does my model shows different accuracy each time?
另外为什么我的 model 每次都显示不同的精度?
I am assuming you are using the train_test_split
method of scikit-learn so split your data?我假设您使用的是 scikit-learn 的
train_test_split
方法,所以拆分您的数据? By default, this method shuffels your data randomized.默认情况下,此方法随机打乱您的数据。 Your could set the
random_state
parameter to a fixed value to obtain reproducable results.您可以将
random_state
参数设置为固定值以获得可重现的结果。
see ( https://github.com/dnishimoto/python-deep-learning/blob/master/Credit%20Card%20Defaults%20-%20hyperparameter.ipynb ) to improve accuracy you do hypertuning and dimension reduction and scaling.请参阅( https://github.com/dnishimoto/python-deep-learning/blob/master/Credit%20Card%20Defaults%20-%20hyperparameter.ipynb )以提高您进行超调以及降维和缩放的准确性。 hypertuning is finding best parameters.
超调正在寻找最佳参数。 whereas dimension reduction is removing features that don't contribute to accuracy reducing noise.
而降维是删除对精度降低噪声没有贡献的特征。 scaling or normalizing reduce noise in the distribution.
缩放或归一化可减少分布中的噪声。
look at GridSearch for find best parameters查看 GridSearch 以找到最佳参数
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.