简体   繁体   English

如何使用 RandomForestRegressor 方法在 Python 中使用 scikitlearn、pandas 预测未来结果?

[英]How do I predict future results with scikitlearn, pandas in Python using RandomForestRegressor method?

Hello I came across this tutorial on how to use python with some libraries to predict future NCAAB games using a sportsreference library.你好,我看到了这个关于如何使用 python 和一些库来使用运动参考库预测未来 NCAAB 比赛的教程。 I will post the code as well as the article.我将发布代码以及文章。 This seems to work well, but I think it is only testing based on games in the past.这似乎运作良好,但我认为它只是基于过去的游戏进行测试。 How would I use it to predict future games of specific teams?我将如何使用它来预测特定球队未来的比赛? For example, what will be the score between Team A and Team B on This Date?例如,A 队和 B 队在此日期的得分是多少?

The problem I see is that some of the data used can only be known after the game is finished.我看到的问题是有些使用的数据只有在游戏结束后才能知道。 This known data is what is being used in the program to predict the score.程序中使用这些已知数据来预测分数。

First Experiment: I tried filling in only the data that I knew on a game before it happened and filling in the remaining data with zero's using fillna(0).第一个实验:我尝试只填充我在游戏发生之前知道的数据,并使用 fillna(0) 用零填充剩余的数据。 Here is what the the csv would look like:这是 csv 的样子:

date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,away_points,away_steal_percentage,away_steals,away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,home date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,AWAY_POINTS,away_steal_percentage,away_steals, away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,家_defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,home_points,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage,home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace 0,0,0,0,0,0,0,0,0,59,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7,7,0,0,0,0,0,0,0,0,0,0,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 _defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,HOME_POINTS,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage, home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7,7,0,0 ,0,0,0,0,0,0,0,0,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0 ,0,0,.1,1,0 The final line of code is changed to: print(model.predict(final_trim).astype(int), y_test) ,0,0,.1,1,0 最后一行代码改为:print(model.predict(final_trim).astype(int), y_test)

"final_trim" being the new csv that is being predicted. “final_trim”是正在预测的新 csv。

The results were not accurate at all.结果根本不准确。 What am I missing?我错过了什么?

Here is the original code:这是原始代码:

import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

FIELDS_TO_DROP = ['away_points', 'home_points', 'date', 'location',
                  'losing_abbr', 'losing_name', 'winner', 'winning_abbr',
                  'winning_name', 'home_ranking', 'away_ranking']

dataset = pd.DataFrame()
teams = Teams()
for team in teams:
    dataset = pd.concat([dataset, team.schedule.dataframe_extended])
X = dataset.drop(FIELDS_TO_DROP, 1).dropna().drop_duplicates()
y = dataset[['home_points', 'away_points']].values
X_train, X_test, y_train, y_test = train_test_split(X, y)
parameters = {'bootstrap': False,
              'min_samples_leaf': 3,
              'n_estimators': 50,
              'min_samples_split': 10,
              'max_features': 'sqrt',
              'max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train, y_train)
print(model.predict(X_test).astype(int), y_test)

And here is the post I got it from: https://towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894这是我从这里得到的帖子: https : //towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894

Thank you!谢谢!

Think of it this way, if you want to test the goodness of fit of your model, then you must know in advance the result so you can measure the distance between your (model) output and the real outcome and perform the necessary tuning to improve your model's overall performance.这样想,如果您想测试模型的拟合优度,那么您必须提前知道结果,以便您可以测量(模型)输出与实际结果之间的距离,并执行必要的调整以改进您模型的整体性能。

Once you have trained your model, if you want to predict future values, then (without much knowledge of what you are working) you should feed your model the same features it was trained with, but with the data you will be making your prediction on.一旦你训练了你的模型,如果你想预测未来的值,那么(在不知道你正在做什么的情况下)你应该为你的模型提供训练时使用的相同特征,但是你将使用这些数据进行预测. Here is a very basic example using two variables to predict the score of two teams (A and B):这是一个非常基本的示例,使用两个变量来预测两支球队(A 和 B)的得分:

import pandas as pd 
data = {'Temperature':[10,20,30,25],'Humidity':[40,50,80,65],'Score_A':[1,2,3,2],'Score_B':[6,3,1,2]}
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.DataFrame(data)
print(df)
X = df[['Temperature','Humidity']]
Y = df[['Score_A','Score_B']]
X_train, X_test, y_train, y_test = train_test_split(X, Y,random_state=42)
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

Here I've trained my model, so if I want to make a future prediction, I would need to pass the same features I've used in training (Temperature and humidity) but with the values I want to make my prediction on.在这里,我已经训练了我的模型,所以如果我想进行未来的预测,我需要传递我在训练中使用的相同特征(温度和湿度),但使用我想要进行预测的值。 Let's say our friend the meteorologist says that the temperature and humidity for thier next match will be 35 and 70 respectively.假设我们的气象学家朋友说他们下一场比赛的温度和湿度分别是 35 度和 70 度。 So I need to use .predict() with those values:所以我需要将.predict()与这些值一起使用:

model.predict(print(model.predict([[35,70]])) 

Which returns an output of:它返回以下输出:

[[2.6 1.4]]

If you wish to make it fancier:如果你想让它更漂亮:

prediction = model.predict([[35,70]])
print("Team A will score: ",prediction[0][0])
print("Team B will score: ",prediction[0][1])

Returning:返回:

Team A will score:  2.6
Team B will score:  1.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何正确预测 python 中的近期值? - How do I predict the near future value correctly in python? 如何使用Pandas和SciKitLearn Stack提高Python脚本的性能? - How to improve performance of a Python script using Pandas and the SciKitLearn Stack? 如何在 sklearn RandomForestRegressor 中正确预测? - How to predict correctly in sklearn RandomForestRegressor? 如何使用TfIdfVectorizer使用SciKitLearn对文档进行分类? - How do I classify documents with SciKitLearn using TfIdfVectorizer? 如何使用 Pandas 数据框从 SageMaker 端点进行预测? - How do I predict from a SageMaker endpoint using a pandas dataframe? 使用python,如何预测SVG中文本的像素宽度? - using python, how do I predict the pixel width of text in an SVG? 使用RandomForestClassifier.predict_proba与RandomForestRegressor.predict - using RandomForestClassifier.predict_proba vs RandomForestRegressor.predict 如何使用 scikitlearn 保存一个热编码模型并预测新的未编码数据? - How to save one hot encoded model and predict new unencoded data using scikitlearn? 如何使用 LSTM Keras 预测未来股票 - How to predict future Stock using LSTM Keras 我如何使用2列Pandas DataFrame从python graphviz复制webgraphviz的结果 - How do I reproduce results from webgraphviz with python graphviz using 2 column pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM