繁体   English   中英

如何使用 RandomForestRegressor 方法在 Python 中使用 scikitlearn、pandas 预测未来结果?

[英]How do I predict future results with scikitlearn, pandas in Python using RandomForestRegressor method?

你好,我看到了这个关于如何使用 python 和一些库来使用运动参考库预测未来 NCAAB 比赛的教程。 我将发布代码以及文章。 这似乎运作良好,但我认为它只是基于过去的游戏进行测试。 我将如何使用它来预测特定球队未来的比赛? 例如,A 队和 B 队在此日期的得分是多少?

我看到的问题是有些使用的数据只有在游戏结束后才能知道。 程序中使用这些已知数据来预测分数。

第一个实验:我尝试只填充我在游戏发生之前知道的数据,并使用 fillna(0) 用零填充剩余的数据。 这是 csv 的样子:

date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,AWAY_POINTS,away_steal_percentage,away_steals, away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,家 _defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,HOME_POINTS,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage, home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7,7,0,0 ,0,0,0,0,0,0,0,0,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0 ,0,0,.1,1,0 最后一行代码改为:print(model.predict(final_trim).astype(int), y_test)

“final_trim”是正在预测的新 csv。

结果根本不准确。 我错过了什么?

这是原始代码:

import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

FIELDS_TO_DROP = ['away_points', 'home_points', 'date', 'location',
                  'losing_abbr', 'losing_name', 'winner', 'winning_abbr',
                  'winning_name', 'home_ranking', 'away_ranking']

dataset = pd.DataFrame()
teams = Teams()
for team in teams:
    dataset = pd.concat([dataset, team.schedule.dataframe_extended])
X = dataset.drop(FIELDS_TO_DROP, 1).dropna().drop_duplicates()
y = dataset[['home_points', 'away_points']].values
X_train, X_test, y_train, y_test = train_test_split(X, y)
parameters = {'bootstrap': False,
              'min_samples_leaf': 3,
              'n_estimators': 50,
              'min_samples_split': 10,
              'max_features': 'sqrt',
              'max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train, y_train)
print(model.predict(X_test).astype(int), y_test)

这是我从这里得到的帖子: https : //towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894

谢谢!

这样想,如果您想测试模型的拟合优度,那么您必须提前知道结果,以便您可以测量(模型)输出与实际结果之间的距离,并执行必要的调整以改进您模型的整体性能。

一旦你训练了你的模型,如果你想预测未来的值,那么(在不知道你正在做什么的情况下)你应该为你的模型提供训练时使用的相同特征,但是你将使用这些数据进行预测. 这是一个非常基本的示例,使用两个变量来预测两支球队(A 和 B)的得分:

import pandas as pd 
data = {'Temperature':[10,20,30,25],'Humidity':[40,50,80,65],'Score_A':[1,2,3,2],'Score_B':[6,3,1,2]}
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.DataFrame(data)
print(df)
X = df[['Temperature','Humidity']]
Y = df[['Score_A','Score_B']]
X_train, X_test, y_train, y_test = train_test_split(X, Y,random_state=42)
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

在这里,我已经训练了我的模型,所以如果我想进行未来的预测,我需要传递我在训练中使用的相同特征(温度和湿度),但使用我想要进行预测的值。 假设我们的气象学家朋友说他们下一场比赛的温度和湿度分别是 35 度和 70 度。 所以我需要将.predict()与这些值一起使用:

model.predict(print(model.predict([[35,70]])) 

它返回以下输出:

[[2.6 1.4]]

如果你想让它更漂亮:

prediction = model.predict([[35,70]])
print("Team A will score: ",prediction[0][0])
print("Team B will score: ",prediction[0][1])

返回:

Team A will score:  2.6
Team B will score:  1.4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM