I'm using Keras and I'm trying to build a Neural Network to predict the interest rate of given data. The data looks like this:
loan_amnt annual_inc emp_length int_rate
10000 38000.0 5.600882 12.40
13750 17808.0 5.600882 28.80
26100 68000.0 10.000000 20.00
13000 30000.0 1.000000 20.00
7000 79950.0 7.000000 7.02
The features (X) are loan_amnt
, annual_inc
, and emp_length
. The target (y) is int_rate
.
Here's my process and what I've done after normalizing the data:
#Building out model
model = Sequential([
Dense(9, activation='relu', input_shape=(3,)),
Dense(3, activation='relu'),
Dense(1, activation='linear'),
])
#Compiling model
model.compile(loss='mean_absolute_percentage_error',
metrics=['mse'],
optimizer='RMSprop')
hist = model.fit(X_train, Y_train,
batch_size=100, epochs=20, verbose=1)
Here's an output sample after running model.fit()
:
Epoch 1/20
693/693 [==============================] - 1s 905us/step - loss: 96.2391 - mean_squared_error:
179.8007
Epoch 2/20
693/693 [==============================] - 0s 21us/step - loss: 95.2362 - mean_squared_error:
176.9865
Epoch 3/20
693/693 [==============================] - 0s 20us/step - loss: 94.4133 - mean_squared_error:
174.6367
Finally, evaluating the model model.evaluate(X_train, Y_train)
and got the following output:
693/693 [==============================] - 0s 372us/step
[77.88501817667468, 132.0109032635049]
The question is, how can I know if my model is doing well or not, and how can I read the numbers?
You are using a variant of the MSE
loss which is defined as :
MSE = mean((y_true - y_pred)^2)
So when you have 132.
as a MSE metrics, then you really have a mean of sqrt(132.)
~= 11,5 mean difference between the y_true and y_pred. Which is quite a bit on your data as it is shown on the MSPE
loss, you're having ~78% error on your data.
In example if the y_true was 20, you could either predict 36 or 4. Something like that.
You could say that your error is good when MSPE is at 10%. Depends on your case
You should not check the accuracy of your model using the training data because it makes your solution prone to overfitting. Instead you should set some data aside (20% is what I usually use) to validate your results.
If you plan on doing a lot of testing you should set aside a third dataset only for testing the final solution.
You can also use k_folds cross validation where you train the set on part of the data and use the rest to evaluate it, but doing so multiple times to get a better understanding of how accurate your model is.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.