MLP with Scikitlearn: Artificial Neural Network application for forecast

I have traffic data and I want to predict number of vehicles for the next hour by showing the model these inputs: this hour's number of vehicles and this hour's average speed value. Here is my code:

dataset=pd.read_csv('/content/final - Sayfa5.csv',delimiter=',') 
X = np.array(dataset.iloc[:,1:4])
L = len(dataset)
Y = np.array([dataset.iloc[:,4]])
Y= Y[:,0:L]
Y = np.transpose(Y)

#scaling with MinMaxScaler
scaler = MinMaxScaler()
X = scaler.transform(X)
Y = scaler.transform(Y)

X_train , X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.3)
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error 
mlp = MLPRegressor(activation='logistic')
predictions = mlp.predict(X_test)
print("mse_test :" ,mean_squared_error(Y_test,predictions), "mse_train :",mean_squared_error(Y_train,predictions1))

I got good mse values such as mse_test: 0.005467816018933008 mse_train: 0.005072774796622158

But I am confused in two point:

  1. Should I scale y values, I read so many blog written that one should not to scale Ys, only scale the X_train and X_test. But I got so bad mse scores such as 49,50,100 or even more.

  2. How can I get predictions for the future but not scaled values. For example I wrote:

    Xnew=[[ 80 , 40 , 47],
    [ 80 , 30,  81],
    [ 80 , 33, 115]]
    Xnew = scaler.transform(Xnew)
    print("prediction for that input is" , mlp.predict(Xnew))

But I got scaled values such as: prediction for that input is [0.08533431 0.1402755 0.19497315]

It should have been like this [81,115,102] .

Congrats on using [sklearn's MLPRegressor][1] , an introduction to Neural Networks is always a good thing.

Scaling your input data is critical for neural networks. Consider reviewing Chapter 11 of Etham Alpaydin's Introduction to Machine Learning . This is also put into great detail in the Efficient BackProp paper . To put it plainly, it is critical to scale the input data so that your model learns how to target an output.

In english, scaling in this case means converting your data into values between 0 and 1 (inclusive). A good Stats Exchange post on this describes the differences in scaling. For MinMax scaling, you are keeping the same distribution of your data, including being sensitive to outliers. More robust methods (described in that post) do exist in sklearn , such as RobustScaler .

So take for example a very basic dataset like this:

| Feature 1 | Feature 2 | Feature 3 | Feature 4 | Feature 5 | Target |
|     1     |     17    |     22    |     3     |     3     |   53   |
|     2     |     18    |     24    |     5     |     4     |   54   |
|     1     |     11    |     22    |     2     |     5     |   96   |
|     5     |     20    |     22    |     7     |     5     |   59   |
|     3     |     10    |     26    |     4     |     5     |   66   |
|     5     |     14    |     30    |     1     |     4     |   63   |
|     2     |     17    |     30    |     9     |     5     |   93   |
|     4     |     5     |     27    |     1     |     5     |   91   |
|     3     |     20    |     25    |     7     |     4     |   70   |
|     4     |     19    |     23    |     10    |     4     |   81   |
|     3     |     13    |     8     |     19    |     5     |   14   |
|     9     |     18    |     3     |     67    |     5     |   35   |
|     8     |     12    |     3     |     34    |     7     |   25   |
|     5     |     15    |     6     |     12    |     6     |   33   |
|     2     |     13    |     2     |     4     |     8     |   21   |
|     4     |     13    |     6     |     28    |     5     |   46   |
|     7     |     17    |     7     |     89    |     6     |   21   |
|     4     |     18    |     4     |     11    |     8     |    5   |
|     9     |     19    |     7     |     21    |     5     |   30   |
|     6     |     14    |     6     |     17    |     7     |   73   |

I can slightly modify your code to play with this:

import pandas as pd, numpy as np
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_squared_error 

df = pd.read_clipboard()

# Build data
y = df['Target'].to_numpy()
scaled_y = df['Target'].values.reshape(-1, 1) #returns a numpy array
df.drop('Target', inplace=True, axis=1)
X = df.to_numpy()

#scaling with RobustScaler
scaler = RobustScaler()
X = scaler.fit_transform(X)

# Scaling y just to show you the difference
scaled_y = scaler.fit_transform(scaled_y)

# Set random_state so we can replicate results
X_train , X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=8)
scaled_X_train , scaled_X_test, scaled_y_train, scaled_y_test = train_test_split(X,scaled_y,test_size=0.2, random_state=8)

mlp = MLPRegressor(activation='logistic')
scaled_mlp = MLPRegressor(activation='logistic')

mlp.fit(X_train, y_train)
scaled_mlp.fit(scaled_X_train, scaled_y_train)

preds = mlp.predict(X_test)
scaled_preds = mlp.predict(scaled_X_test)

for pred, scaled_pred, tar, scaled_tar in zip(preds, scaled_preds, y_test, scaled_y_test):
    print("Regular MLP:")
    print("Prediction: {} | Actual: {} | Error: {}".format(pred, tar, tar-pred))
    print("MLP that was shown scaled labels: ")
    print("Prediction: {} | Actual: {} | Error: {}".format(scaled_pred, scaled_tar, scaled_tar-scaled_pred))

In short, shrinking your target will naturally shrink your error, since your model is not learning the actual value, but the value between 0 and 1.

That is why we do not scale our target variable, since the error is naturally smaller since we are forcing the values into a 0...1 space.

  1. you can't compare the MSE between those two models, as it is scale dependent. It is clear that your MSE is smaller when you scale your y, as y (in your case) becomes smaller by applying the MinMaxScaler. When you "unscale" your predicted and actual y values and than calculate the MSE again it should be the same as for the model with the raw y values (not 100% sure though).

Main takeaway: do not compare MSE between models only within a model. The absolute value of MSE is hard to interpret.

  1. I'm not sure I get your question. I presume you're asking that ones you trained your model with scaled y values how can you make predictions of y in a sense that the scale is No. of vehicles. If that is the question that this is exactly why you should not scale y.

What the MinMaxScaler does is calculating the following:

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min

see also the MinMaxScaler docu . You could try to reverse calculate the predictions afterwards but I don't think that this makes sense

  1. Since this is a regression problem you wouldn't likely want to scale the target/response variable in this case. You will want to scale certain features especially considering large magnitude numbers that pair with other features that may be for example binary. But without seeing the full dataset I can't confirm if this would be the way to go here. Also, you should be comparing this to a baseline model. An MSE of 45,000 may seem bad, but if the MSE of the baseline is 10X that then your model just improved on the baseline MSE by 1000%.

TL;DR Only attempt to scale if features differ in magnitude or there are large outliers in a given feature. enter code here

  1. If you don't scale your target variable, you shouldn't need to attempt to 'rescale it'. However, if you need/want to you can look into using TransformedTargetRegressor here

