如何在 sklearn RandomForestRegressor 中正确预测？

Question

I'm working on a big data project for my school project.我正在为我的学校项目开展一个大数据项目。 My dataset looks like this: https://github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv我的数据集如下所示： https : //github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv

I'm trying to predict the next values of "LandAverageTemperature".我正在尝试预测“LandAverageTemperature”的下一个值。

First, I've imported the csv into pandas and made it DataFrame named "df1".首先，我已将 csv 导入到 Pandas 并将其命名为“df1”的 DataFrame。

After taking errors on my first tries in sklearn, I converted the "dt" column into datetime64 from string then added a column named "year" that shows only the years in the date values.-Its probably wrong-在 sklearn 中的第一次尝试中出错后，我将“dt”列从字符串转换为 datetime64，然后添加了一个名为“year”的列，该列仅显示日期值中的年份。-它可能是错误的-

df1["year"] = pd.DatetimeIndex(df1['dt']).year

After all of that, I prepared my data for reggression and called RandomForestReggressor:完成所有这些之后，我准备了用于回归的数据并调用了 RandomForestReggressor：

landAvg = df1[["LandAverageTemperature"]]
year = df1[["year"]]

from sklearn.ensemble import RandomForestRegressor

rf_reg=RandomForestRegressor(n_estimators=10,random_state=0)
rf_reg.fit(year,landAvg.values.ravel())
print("Random forest:",rf_reg.predict(landAvg))

I ran the code and I've seen this result:我运行了代码，我看到了这个结果：

Random forest: [9.26558115 9.26558115 9.26558115 ... 9.26558115 9.26558115 9.26558115]

I'm not getting any errors but I don't think the results are correct -results are all the same as you can see-.我没有收到任何错误，但我认为结果不正确-结果与您所看到的完全相同-。 Besides, when I want to get next 10 year's predictions, I don't know how to do that.此外，当我想获得下一个 10 年的预测时，我不知道该怎么做。 I just get 1 result with this code.使用此代码我只得到 1 个结果。 Can you help me for improve my code and get the right results?你能帮我改进我的代码并获得正确的结果吗？ Thanks in advance for your help.在此先感谢您的帮助。

Answer 1

It's not enought to use only year to predict temperature.仅用年份来预测温度是不够的。 Your need to use month data too.您也需要使用月份数据。 Here is a working example for starters:这是初学者的工作示例：

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
df = pd.read_csv('https://raw.githubusercontent.com/gindeleo/climate/master/GlobalTemperatures.csv', usecols=['dt','LandAverageTemperature'], parse_dates=['dt'])
df = df.dropna()
df["year"] = df['dt'].dt.year
df["month"] = df['dt'].dt.month
X = df[["month", "year"]]
y = df["LandAverageTemperature"]
rf_reg=RandomForestRegressor(n_estimators=10,random_state=0)
rf_reg.fit(X, y)
y_pred = rf_reg.predict(X)
df_result = pd.DataFrame({'year': X['year'], 'month': X['month'], 'true': y, 'pred': y_pred})
print('True values and predictions')
print(df_result)
print('Feature importances', list(zip(X.columns, rf_reg.feature_importances_)))

And here is output:这是输出：

True values and predictions
      year  month    true     pred
0     1750      1   3.034   2.2944
1     1750      2   3.083   2.4222
2     1750      3   5.626   5.6434
3     1750      4   8.490   8.3419
4     1750      5  11.573  11.7569
...    ...    ...     ...      ...
3187  2015      8  14.755  14.8004
3188  2015      9  12.999  13.0392
3189  2015     10  10.801  10.7068
3190  2015     11   7.433   7.1173
3191  2015     12   5.518   5.1634

[3180 rows x 4 columns]
Feature importances [('month', 0.9543059863177156), ('year', 0.045694013682284394)]

如何在 sklearn RandomForestRegressor 中正确预测？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-12-24 16:49:49

如何在 sklearn RandomForestRegressor 中正确预测？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-12-24 16:49:49

解决方案1
1 已采纳 2019-12-24 16:49:49