简体   繁体   English

这个 R 神经网络是不是过拟合了?

[英]Is this R neural network overfitting?

I have built an neural network with the neuralnet package in R to predict stock prices.我用 R 中的神经网络包构建了一个神经网络来预测股票价格。 My model and code works well, but I am getting an accuracy around 97-99%, which makes me a bit suspicious:我的模型和代码运行良好,但准确率约为 97-99%,这让我有点怀疑:

Is my model overfitting?我的模型是否过拟合?

This is the Dataset I am using (which is already scaled), and this is my original Dataset (not scaled), which I need to calc the accuracy for.是我正在使用的数据集(已经缩放),是我的原始数据集(未缩放),我需要计算其准确性。 This is the code to build and test the model:这是构建和测试模型的代码:

normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}

nn_df <- as.data.frame(lapply(nn_df, normalize))    

nn_df_train = as.data.frame(nn_df[1:1965,]) #1965
nn_df_test = as.data.frame(nn_df[1966:2808,]) #843

# NN for Sentiment GI
nn_model <- neuralnet(GSPC.Close ~ GSPC.Open +GSPC.Low + GSPC.High + SentimentGI, data = nn_df_train, hidden=5, linear.output=TRUE, threshold=0.01)

plot(nn_model)

nn_model$result.matrix

nn_pred <- compute(nn_model, nn_df_test)
nn_pred$net.result

results <- data.frame(actual = nn_df_test$GSPC.Close, prediction = nn_pred$net.result)
results

#calc accuracy
predicted = results$prediction * abs(diff(range(nn_org$GSPC.Close))) + min(nn_org$GSPC.Close)
actual = results$actual * abs(diff(range(nn_org$GSPC.Close))) + min(nn_org$GSPC.Close)
comparison = data.frame(predicted,actual)
#deviation=((actual-predicted)/actual)
deviation= abs((actual-predicted)/actual)
comparison=data.frame(predicted,actual,deviation)
accuracy=1-abs(mean(deviation))
accuracy

I would say that there's only a risk of overfitting if:我会说只有在以下情况下才会有过度拟合的风险:

  • the model has been trained on several iterations, for which every iteration shuffles the content of the training and test groups该模型已经过多次迭代训练,每次迭代都会对训练组和测试组的内容进行混洗
  • the model had been optimized after the parameters have been tweaked iteratively in order to get good accuracy in your test data.为了在测试数据中获得良好的准确性,在迭代调整参数后,模型已经过优化。

In both situations, a validation set would be necessary.在这两种情况下,都需要验证集。

If none of the above is the case, then that accuracy is clearly more reliable.如果上述情况都不是,那么该准确性显然更可靠。 Even so, you could test the same model with an extra set of data just to confirm your results.即便如此,您也可以使用一组额外的数据测试相同的模型,以确认您的结果。

Edit : let me add that, although surprisingly, I got a similar accuracy in a 30-day in the future stock prediction model by only using linear regression;编辑:让我补充一点,尽管令人惊讶的是,我仅使用线性回归在未来 30 天的股票预测模型中获得了类似的准确度; so I do not know whether this level of accuracy is good enough in the stock market prediction area.所以不知道这个水平的准确率在股市预测领域是否足够好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM