简体   繁体   English

Keras 回归模型建议

[英]Model Suggestion for Keras Regression

I am trying to solve a regression with Keras but MSE is huge, I mean like 29346217.6819我正在尝试使用 Keras 解决回归问题,但 MSE 很大,我的意思是 29346217.6819

I am really new, so do you have any suggestions to make the model give reasonable mse?我真的很新,所以你有什么建议可以让模型给出合理的 mse 吗? I am not sure even my data is OK or problematic but those are actual sales data.我什至不确定我的数据是否正常或有问题,但这些是实际的销售数据。

Data (about to 3000 lines. I use 2000 for training and 1000 for testing) Full data is here数据(大约 3000 行。我使用 2000 进行训练,1000 用于测试)完整数据在这里

ProductNo,Day,Month,CartonSales
1,6,02,2374
1,3,02,2374
1,6,04,2374
1,6,04,2374
1,3,06,2374
1,6,09,2374
1,1,09,2374
1,6,09,2374
1,6,10,2374

Code代码

from keras import optimizers
from keras.callbacks import Callback
from numpy import array
from keras.models import Sequential
from keras.layers import Dense, Dropout
from matplotlib import pyplot
import pandas as pds
# prepare sequence


class TestCallback(Callback):
    def __init__(self, test_data):
        self.test_data = test_data

    def on_epoch_end(self, epoch, logs={}):
        x, y = self.test_data
        loss, acc = self.model.evaluate(x, y, verbose=0)
        print('\nTesting loss: {}, acc: {}\n'.format(loss, acc))

dataframe = pds.read_csv('pmidata.csv', usecols=[0, 1, 2, 3])
dataframe = dataframe.sample(frac=1)

dataframeX_train = dataframe.iloc[0:2000][['ProductNo', 'Day', 'Month']]
dataframeY_train = dataframe.iloc[0:2000][['CartonSales']]

dataframeX_test = dataframe.iloc[2001:3001][['ProductNo', 'Day', 'Month']]
dataframeY_test = dataframe.iloc[2001:3001][['CartonSales']]

# create model
model = Sequential()
model.add(Dense(3, input_dim=3, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=['mse'])
#sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#model.compile(loss='mse', optimizer=sgd, metrics=['mse'])
# train model
#history = model.fit(dataframe, dataframe, epochs=500, batch_size=len(X), verbose=2)
history = model.fit(dataframeX_train, dataframeY_train, epochs=100, batch_size=4, verbose=2, callbacks=[TestCallback((dataframeX_test, dataframeY_test))])
# plot metrics
pyplot.plot(history.history['mean_squared_error'])
pyplot.show()

As far as i can tell from your code above, your y values are CartonSales.据我从上面的代码可以看出,您的 y 值是 CartonSales。 Sales can have large values and large range and that's probably why you get such a high error.销售额可以有很大的值和很大的范围,这可能就是你得到如此高错误的原因。 You could use mean_squared_logarithmic_error instead of mean square error but i would suggest to do the following.您可以使用 mean_squared_logarithmic_error 而不是均方误差,但我建议执行以下操作。

Continue using mean square error.继续使用均方误差。 log transform you y values and later exp transform you predictions log 转换您的 y 值,然后 exp 转换您的预测

import numpy as np
dataframeY_train = np.log(dataframeY_train)
dataframeY_test = np.log(dataframeY_test )
....
predictions=model.predict(dataframeX_test)[:,0]
predictions = np.exp(predictions)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM