[英]How do I predict future results with scikitlearn, pandas in Python using RandomForestRegressor method?
Hello I came across this tutorial on how to use python with some libraries to predict future NCAAB games using a sportsreference library.你好,我看到了这个关于如何使用 python 和一些库来使用运动参考库预测未来 NCAAB 比赛的教程。 I will post the code as well as the article.
我将发布代码以及文章。 This seems to work well, but I think it is only testing based on games in the past.
这似乎运作良好,但我认为它只是基于过去的游戏进行测试。 How would I use it to predict future games of specific teams?
我将如何使用它来预测特定球队未来的比赛? For example, what will be the score between Team A and Team B on This Date?
例如,A 队和 B 队在此日期的得分是多少?
The problem I see is that some of the data used can only be known after the game is finished.我看到的问题是有些使用的数据只有在游戏结束后才能知道。 This known data is what is being used in the program to predict the score.
程序中使用这些已知数据来预测分数。
First Experiment: I tried filling in only the data that I knew on a game before it happened and filling in the remaining data with zero's using fillna(0).第一个实验:我尝试只填充我在游戏发生之前知道的数据,并使用 fillna(0) 用零填充剩余的数据。 Here is what the the csv would look like:
这是 csv 的样子:
date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,away_points,away_steal_percentage,away_steals,away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,home date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,AWAY_POINTS,away_steal_percentage,away_steals, away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,家_defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,home_points,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage,home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace 0,0,0,0,0,0,0,0,0,59,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7,7,0,0,0,0,0,0,0,0,0,0,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
_defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,HOME_POINTS,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage, home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7,7,0,0 ,0,0,0,0,0,0,0,0,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0 ,0,0,.1,1,0 The final line of code is changed to: print(model.predict(final_trim).astype(int), y_test)
,0,0,.1,1,0 最后一行代码改为:print(model.predict(final_trim).astype(int), y_test)
"final_trim" being the new csv that is being predicted. “final_trim”是正在预测的新 csv。
The results were not accurate at all.结果根本不准确。 What am I missing?
我错过了什么?
Here is the original code:这是原始代码:
import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
FIELDS_TO_DROP = ['away_points', 'home_points', 'date', 'location',
'losing_abbr', 'losing_name', 'winner', 'winning_abbr',
'winning_name', 'home_ranking', 'away_ranking']
dataset = pd.DataFrame()
teams = Teams()
for team in teams:
dataset = pd.concat([dataset, team.schedule.dataframe_extended])
X = dataset.drop(FIELDS_TO_DROP, 1).dropna().drop_duplicates()
y = dataset[['home_points', 'away_points']].values
X_train, X_test, y_train, y_test = train_test_split(X, y)
parameters = {'bootstrap': False,
'min_samples_leaf': 3,
'n_estimators': 50,
'min_samples_split': 10,
'max_features': 'sqrt',
'max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train, y_train)
print(model.predict(X_test).astype(int), y_test)
And here is the post I got it from: https://towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894这是我从这里得到的帖子: https : //towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894
Thank you!谢谢!
Think of it this way, if you want to test the goodness of fit of your model, then you must know in advance the result so you can measure the distance between your (model) output and the real outcome and perform the necessary tuning to improve your model's overall performance.这样想,如果您想测试模型的拟合优度,那么您必须提前知道结果,以便您可以测量(模型)输出与实际结果之间的距离,并执行必要的调整以改进您模型的整体性能。
Once you have trained your model, if you want to predict future values, then (without much knowledge of what you are working) you should feed your model the same features it was trained with, but with the data you will be making your prediction on.一旦你训练了你的模型,如果你想预测未来的值,那么(在不知道你正在做什么的情况下)你应该为你的模型提供训练时使用的相同特征,但是你将使用这些数据进行预测. Here is a very basic example using two variables to predict the score of two teams (A and B):
这是一个非常基本的示例,使用两个变量来预测两支球队(A 和 B)的得分:
import pandas as pd
data = {'Temperature':[10,20,30,25],'Humidity':[40,50,80,65],'Score_A':[1,2,3,2],'Score_B':[6,3,1,2]}
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.DataFrame(data)
print(df)
X = df[['Temperature','Humidity']]
Y = df[['Score_A','Score_B']]
X_train, X_test, y_train, y_test = train_test_split(X, Y,random_state=42)
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)
Here I've trained my model, so if I want to make a future prediction, I would need to pass the same features I've used in training (Temperature and humidity) but with the values I want to make my prediction on.在这里,我已经训练了我的模型,所以如果我想进行未来的预测,我需要传递我在训练中使用的相同特征(温度和湿度),但使用我想要进行预测的值。 Let's say our friend the meteorologist says that the temperature and humidity for thier next match will be 35 and 70 respectively.
假设我们的气象学家朋友说他们下一场比赛的温度和湿度分别是 35 度和 70 度。 So I need to use
.predict()
with those values:所以我需要将
.predict()
与这些值一起使用:
model.predict(print(model.predict([[35,70]]))
Which returns an output of:它返回以下输出:
[[2.6 1.4]]
If you wish to make it fancier:如果你想让它更漂亮:
prediction = model.predict([[35,70]])
print("Team A will score: ",prediction[0][0])
print("Team B will score: ",prediction[0][1])
Returning:返回:
Team A will score: 2.6
Team B will score: 1.4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.