I have a set of football data in a database that I am trying to predict values for.
import MySQLdb
import pandas as pd
from sklearn.feature_selection import RFE
from sqlalchemy import create_engine
import mysql.connector
from matplotlib import pyplot
mysql_cn= MySQLdb.connect(host='database.rds.amazonaws.com',port=3306,user='username', passwd='password', db='dev')
games = pd.read_sql('SELECT game_id, game_date_id, home_team_id, away_team_id, referee_id, FTR, away_team_travel FROM
dev.tmp_all_output_id WHERE game_id < 6700;', con=mysql_cn)
predict_games = pd.read_sql('SELECT game_id, game_date_id,
home_team_id, away_team_id, referee_id, -10 AS FTR, away_team_travel FROM dev.tmp_all_output_id WHERE game_id > 6700;', con=mysql_cn)
feature_names = ['game_id', 'game_date_id', 'home_team_id', 'away_team_id', 'referee_id', 'away_team_travel']
X = games[feature_names]
y = games['FTR']
# #Create Training and Test Sets and Apply Scaling
from sklearn.model_selection import train_test_split
validation_size = 0.20
seed = 7
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=validation_size, random_state=0)
from sklearn.ensemble import AdaBoostClassifier
ada = AdaBoostClassifier()
ada.fit(X_train, y_train)
predictions = ada.predict(X_test)
print('Accuracy of AdaBoostClassifier on training set: {:.2f}'.format(ada.score(X_train, y_train)))
print('Accuracy of AdaBoostClassifier on test set: {:.2f}'.format(ada.score(X_test, y_test)))
#cnx = create_engine('mysql+mysqlconnector://username:password@database.rds.amazonaws.com:3306/dev', echo=False)
#testResults.to_sql(name='tmp_all_output_prediction', con=cnx, if_exists = 'replace', index=False)
mysql_cn.close()
Once I have loaded my data set into a data frame and run a test_train_split and a fit on in, how do I predict values for an unseen data set and return the game_id's and the prediction values (FTR)?
As you can see in the code, I have a table (tmp_all_output_id) where I select known result values into 'games' and select unknown (or unplayed) results into 'predict_games'. I also set FTR (full time result) for 'predict_games' = -10 as at this point the result of these games is not yet known.
But how do I use the training that I have done to predict FTR for the data frame 'predict_games'?
I tried to use this code to predict, however it always came back with 0 (draw) for FTR which is certainly not correct.
testResults = predict_games[['game_id']]
testResults.is_copy = None
testResults['FTR'] = raw_prediction
I have added the following code:
unseen_prediction = predict_games[feature_names] new_predictions = ada.predict(unseen_prediction) print new_predictions
However every predicted value is returned as: -1 (away win) which is not correct
Your ada
variable is now a trained classifier instance. In order to use it to classify new data, you construct an X
with the data in a format corresponding to 'game_id', 'game_date_id', 'home_team_id', 'away_team_id', 'referee_id', 'away_team_travel'
.
Then you run ada.predict(X)
and you're done!
The issue is that you're currently only passing in the game_id.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.