简体   繁体   中英

Why does Random Forest Regression predict the exact same value?

I am attempting to use Scikit-Learn's Random Forest regressor to predict Nominal GDP from Real GDP.

I read the data from a webstite and clean it up a bit, then synthesize a dataframe with what I have forecasted are the next three years of Real GDP.

I have the following code:

from sklearn.ensemble import RandomForestRegressor

gdp = pd.read_html('https://www.thebalance.com/us-gdp-by-year-3305543')[0]
gdp.columns = gdp.iloc[0]
gdp = gdp[1:]

gdp['Year'] = gdp['Year'].astype(int)

gdp['Nominal GDP (trillions)'] = gdp['Nominal GDP (trillions)'].str.replace(',', '.').str.replace('$', '').astype(float)
gdp['Real GDP (trillions)'] = gdp['Real GDP (trillions)'].str.replace(',', '.').str.replace('$', '').astype(float)

X = pd.DataFrame(gdp['Real GDP (trillions)'].copy())
y = pd.DataFrame(gdp['Nominal GDP (trillions)'].copy())


X_pred = pd.DataFrame(data = [18.313, 18.960, 19.643], columns = ['Real GDP (trillions)'])

reg = RandomForestRegressor(n_estimators = 300)
reg.fit(X, y.values.ravel())

y_pred = reg.predict(X_pred)

It returns the following prediction: 1 | 2 | 3 ---|---|--- 19.72172 | 21.05464667 | 21.05464667

Why are the second and third predictions identical? It happens even if I change the X_pred values to something like [18.313, 18.960, 39.643]

In your training data, there's only one value > 18.960:

X[X.values>18.960]

    Real GDP (trillions)
91  19.092

So it is highly unlikely you will end up with a value that can split 18.960 and 19.643, or for that matter, 18.960 and 39.643. It is not linear regression where you can interpolate.

We can check the thresholds for each tree:

thres = np.unique([j for i in reg.estimators_ for j in i.tree_.threshold])
np.sort(thres)[-10:]

array([17.80000019, 17.9375    , 18.00199986, 18.05999947, 18.20950031,
       18.26199913, 18.41149998, 18.41599941, 18.61799908, 18.88999939])

The largest value of your threshold is not able to split the 2 values you are trying to predict, hence they will always end up in the same nodes, giving you the same prediction.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM