简体   繁体   English

如何在 sklearn RandomForestRegressor 中正确预测?

[英]How to predict correctly in sklearn RandomForestRegressor?

I'm working on a big data project for my school project.我正在为我的学校项目开展一个大数据项目。 My dataset looks like this: https://github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv我的数据集如下所示: https : //github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv

I'm trying to predict the next values of "LandAverageTemperature".我正在尝试预测“LandAverageTemperature”的下一个值。

First, I've imported the csv into pandas and made it DataFrame named "df1".首先,我已将 csv 导入到 Pandas 并将其命名为“df1”的 DataFrame。

After taking errors on my first tries in sklearn, I converted the "dt" column into datetime64 from string then added a column named "year" that shows only the years in the date values.-Its probably wrong-在 sklearn 中的第一次尝试中出错后,我将“dt”列从字符串转换为 datetime64,然后添加了一个名为“year”的列,该列仅显示日期值中的年份。-它可能是错误的-

df1["year"] = pd.DatetimeIndex(df1['dt']).year

After all of that, I prepared my data for reggression and called RandomForestReggressor:完成所有这些之后,我准备了用于回归的数据并调用了 RandomForestReggressor:

landAvg = df1[["LandAverageTemperature"]]
year = df1[["year"]]

from sklearn.ensemble import RandomForestRegressor

rf_reg=RandomForestRegressor(n_estimators=10,random_state=0)
rf_reg.fit(year,landAvg.values.ravel())
print("Random forest:",rf_reg.predict(landAvg))

I ran the code and I've seen this result:我运行了代码,我看到了这个结果:

Random forest: [9.26558115 9.26558115 9.26558115 ... 9.26558115 9.26558115 9.26558115]

I'm not getting any errors but I don't think the results are correct -results are all the same as you can see-.我没有收到任何错误,但我认为结果不正确-结果与您所看到的完全相同-。 Besides, when I want to get next 10 year's predictions, I don't know how to do that.此外,当我想获得下一个 10 年的预测时,我不知道该怎么做。 I just get 1 result with this code.使用此代码我只得到 1 个结果。 Can you help me for improve my code and get the right results?你能帮我改进我的代码并获得正确的结果吗? Thanks in advance for your help.在此先感谢您的帮助。

It's not enought to use only year to predict temperature.仅用年份来预测温度是不够的。 Your need to use month data too.您也需要使用月份数据。 Here is a working example for starters:这是初学者的工作示例:

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
df = pd.read_csv('https://raw.githubusercontent.com/gindeleo/climate/master/GlobalTemperatures.csv', usecols=['dt','LandAverageTemperature'], parse_dates=['dt'])
df = df.dropna()
df["year"] = df['dt'].dt.year
df["month"] = df['dt'].dt.month
X = df[["month", "year"]]
y = df["LandAverageTemperature"]
rf_reg=RandomForestRegressor(n_estimators=10,random_state=0)
rf_reg.fit(X, y)
y_pred = rf_reg.predict(X)
df_result = pd.DataFrame({'year': X['year'], 'month': X['month'], 'true': y, 'pred': y_pred})
print('True values and predictions')
print(df_result)
print('Feature importances', list(zip(X.columns, rf_reg.feature_importances_)))

And here is output:这是输出:

True values and predictions
      year  month    true     pred
0     1750      1   3.034   2.2944
1     1750      2   3.083   2.4222
2     1750      3   5.626   5.6434
3     1750      4   8.490   8.3419
4     1750      5  11.573  11.7569
...    ...    ...     ...      ...
3187  2015      8  14.755  14.8004
3188  2015      9  12.999  13.0392
3189  2015     10  10.801  10.7068
3190  2015     11   7.433   7.1173
3191  2015     12   5.518   5.1634

[3180 rows x 4 columns]
Feature importances [('month', 0.9543059863177156), ('year', 0.045694013682284394)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Python Sklearn RandomForestRegressor 中显示 model 参数 - How to display model parameter in Python Sklearn RandomForestRegressor 如何在 python 中使用 sklearn 回归器正确预测目标变量? - How to correctly predict target variables with sklearn regressor in python? 如何正确重塑sklearn分类器的predict_proba的多类output? - How to correctly reshape the multiclass output of predict_proba of a sklearn classifier? 绘制sklearn RandomForestRegressor MSE - Plot sklearn RandomForestRegressor MSE sklearn.RandomForestRegressor 中的 oob_score_ 是如何计算的? - How is oob_score_ calculated in sklearn.RandomForestRegressor? 在RandomForestRegressor sklearn中绘制要素重要性 - Plot feature importance in RandomForestRegressor sklearn sklearn 中的 RandomForestRegressor 给出负分 - RandomForestRegressor in sklearn giving negative scores 如何使用 RandomForestRegressor 方法在 Python 中使用 scikitlearn、pandas 预测未来结果? - How do I predict future results with scikitlearn, pandas in Python using RandomForestRegressor method? 如何使sklearn.ensemble.RandomForestRegressor不照顾杂质减少启发式 - how to make sklearn.ensemble.RandomForestRegressor not take care of impurity decrease heuristic 如何使用 SKlearn 预测单个值? - How to predict an individual value using SKlearn?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM