簡體   English   中英

如何使用 RandomForestRegressor 方法在 Python 中使用 scikitlearn、pandas 預測未來結果?

[英]How do I predict future results with scikitlearn, pandas in Python using RandomForestRegressor method?

你好,我看到了這個關於如何使用 python 和一些庫來使用運動參考庫預測未來 NCAAB 比賽的教程。 我將發布代碼以及文章。 這似乎運作良好,但我認為它只是基於過去的游戲進行測試。 我將如何使用它來預測特定球隊未來的比賽? 例如,A 隊和 B 隊在此日期的得分是多少?

我看到的問題是有些使用的數據只有在游戲結束后才能知道。 程序中使用這些已知數據來預測分數。

第一個實驗:我嘗試只填充我在游戲發生之前知道的數據,並使用 fillna(0) 用零填充剩余的數據。 這是 csv 的樣子:

date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,AWAY_POINTS,away_steal_percentage,away_steals, away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,家 _defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,HOME_POINTS,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage, home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7,7,0,0 ,0,0,0,0,0,0,0,0,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0 ,0,0,.1,1,0 最后一行代碼改為:print(model.predict(final_trim).astype(int), y_test)

“final_trim”是正在預測的新 csv。

結果根本不准確。 我錯過了什么?

這是原始代碼:

import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

FIELDS_TO_DROP = ['away_points', 'home_points', 'date', 'location',
                  'losing_abbr', 'losing_name', 'winner', 'winning_abbr',
                  'winning_name', 'home_ranking', 'away_ranking']

dataset = pd.DataFrame()
teams = Teams()
for team in teams:
    dataset = pd.concat([dataset, team.schedule.dataframe_extended])
X = dataset.drop(FIELDS_TO_DROP, 1).dropna().drop_duplicates()
y = dataset[['home_points', 'away_points']].values
X_train, X_test, y_train, y_test = train_test_split(X, y)
parameters = {'bootstrap': False,
              'min_samples_leaf': 3,
              'n_estimators': 50,
              'min_samples_split': 10,
              'max_features': 'sqrt',
              'max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train, y_train)
print(model.predict(X_test).astype(int), y_test)

這是我從這里得到的帖子: https : //towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894

謝謝!

這樣想,如果您想測試模型的擬合優度,那么您必須提前知道結果,以便您可以測量(模型)輸出與實際結果之間的距離,並執行必要的調整以改進您模型的整體性能。

一旦你訓練了你的模型,如果你想預測未來的值,那么(在不知道你正在做什么的情況下)你應該為你的模型提供訓練時使用的相同特征,但是你將使用這些數據進行預測. 這是一個非常基本的示例,使用兩個變量來預測兩支球隊(A 和 B)的得分:

import pandas as pd 
data = {'Temperature':[10,20,30,25],'Humidity':[40,50,80,65],'Score_A':[1,2,3,2],'Score_B':[6,3,1,2]}
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.DataFrame(data)
print(df)
X = df[['Temperature','Humidity']]
Y = df[['Score_A','Score_B']]
X_train, X_test, y_train, y_test = train_test_split(X, Y,random_state=42)
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

在這里,我已經訓練了我的模型,所以如果我想進行未來的預測,我需要傳遞我在訓練中使用的相同特征(溫度和濕度),但使用我想要進行預測的值。 假設我們的氣象學家朋友說他們下一場比賽的溫度和濕度分別是 35 度和 70 度。 所以我需要將.predict()與這些值一起使用:

model.predict(print(model.predict([[35,70]])) 

它返回以下輸出:

[[2.6 1.4]]

如果你想讓它更漂亮:

prediction = model.predict([[35,70]])
print("Team A will score: ",prediction[0][0])
print("Team B will score: ",prediction[0][1])

返回:

Team A will score:  2.6
Team B will score:  1.4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM