简体   繁体   中英

Enormous and weird error in scikit-learn in Python Logistic Regression?

Below operations concern Logistic Regression in Python scikit-learn

I give you the most important sample of the code:

predictions = logistic_regression.predict(X_test)
prediction=logistic_regression.predict_proba(X_test)[:,:]
prediction=pd.DataFrame(data=predictions, 
                         columns=['Prob of Bad credit (0)','Prob of Good credit (1)'])
prediction.head(10)

And yesterday I had result of this code which was in line with my expectations: (not the same table title but the same result)

enter image description here

But today, I absolute do not have idea why, when I wanted to run this code again I have an Error:

ValueError: Shape of passed values is (300, 1), indices imply (300, 2)

How it is possible that yesterday it worked and today not ? What can I do ? Screen of full error below:

enter image description here

sample of predictions is like that:

print(predictions)

[1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

and I do not want to have 1 or 0 in table I would like to have in percent probaility of 1 or 0 as in example in screen

Look at the same table at the end of prediction from below source, there is the same code and it works: https://www.kaggle.com/neisha/heart-disease-prediction-using-logistic-regression

I think the error occurs because prediction has just one row, and you have two column names:

prediction=pd.DataFrame(data=predictions, 
                         columns=['Prob of Bad credit (0)','Prob of Good credit (1)'])

Based on the codes on kaggle you provide:

y_pred_prob=logreg.predict_proba(x_test)[:,:]
y_pred_prob_df=pd.DataFrame(data=y_pred_prob, columns=['Prob of no heart disease (0)','Prob of Heart Disease (1)'])
y_pred_prob_df.head()

I think you should change your code to:

prediction_df = pd.DataFrame(data=prediction,  
                         columns=['Prob of Bad credit (0)','Prob of Good credit (1)'])

Be careful it should be prediction, not predictions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM