简体   繁体   中英

Random Forest Regression not give 0 or 1

I'm currently using RandomForestRegression for Titanic(Kaggle).

%%timeit
model = RandomForestRegressor(n_estimators=200, oob_score=False,n_jobs=1,random_state=42)
model.fit(X,y)
#y_oob = model.oob_prediction_
#print("c-stat:", roc_auc_score(y,model.oob_prediction_))

prediction_regression = model.predict(X_test)
# dataframe with predictions
kaggle = pd.DataFrame({'PassengerId': passengerId, 'Survived': prediction_regression})
# save to csv
kaggle.to_csv('./csvToday/prediction_regression.csv', index=False)

but it returns not 0 or 1 . it gives decimal points

892: 0.3163
893: 0.07 such and such

How to make RandomForestRegression return as 0 or 1

Regression is a machine learning problem of predicting quantity/amount/price (such as market stock prediction, home price prediction, etc). As far, as I remember, the goal of titanic competition is to predict whether a passenger survive. It's sounds like a binary classification problem. If it's a classification problem you should use RandomForestClassifier ( docs ).

So your code would look like:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    #some parameters
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)


submit_df = pd.DataFrame({'PassengerId': passengerId, 'Survived': y_pred})
submit_df.to_csv('./csvToday/submission.csv', index=False)

This kernel can provide you with some more insights.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM