简体   繁体   English

制作正确的人工神经网络进行预测

[英]Making a correct ANN for forecasting

This is my first time using python, so I'm having lots of doubts.这是我第一次使用python,所以我有很多疑问。

I'm trying to make a simple ANN for forecasting in Pybrain.我正在尝试制作一个简单的人工神经网络,用于在 Pybrain 中进行预测。 It is a 2 input-1 output net.它是一个2输入1输出的网络。 The inputs are, in the first column has the years and the second column has the months of the year.输入是,第一列是年份,第二列是一年中的月份。 The outputs are the normal rainfall, linked to each month.输出是与每个月相关的正常降雨量。

I don't know how many things I am doing wrong, but when I plot the results, I'm having errors.我不知道我做错了多少,但是当我绘制结果时,我遇到了错误。

This is my code:这是我的代码:

from pybrain.datasets import SupervisedDataSet
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.tools.validation import ModuleValidator
from pybrain.structure import SigmoidLayer, LinearLayer,TanhLayer
from pybrain.utilities import percentError
import matplotlib.pyplot as plt
import numpy as np
import math

#----------------------------------------------------------------------------------------------------------------------
if __name__ == '__main__':

    ds = SupervisedDataSet(2,1)  

    input = np.loadtxt('entradas.csv', delimiter=',')

    output = np.loadtxt('salidas.csv', delimiter=',')

    for x in range(0, len(input)):
        ds.addSample(input[x], output[x])

    print (ds['input'])
    print ("Hay una serie de",len(ds['target']),"datos")
    #print(ds)

    # Definicion topologia de la Red Neuronal  

    n = buildNetwork(ds.indim,5,ds.outdim,recurrent=True,hiddenclass=SigmoidLayer)  
    #ENTRENAMIENTO DE LA RED NEURONAL

    trndata,partdata=ds.splitWithProportion(0.60)

    tstdata,validata=partdata.splitWithProportion(0.50)

    print ("Datos para Validacion:",len(validata))
    print("Datos para Test:", len(tstdata))
    print("Datos para Entrenamiento:", len(trndata))

    treinadorSupervisionado = BackpropTrainer(n, dataset=trndata,momentum=0.1,verbose=True,weightdecay=0.01) 

    numeroDeEpocasPorPunto = 100
    trnerr,valerr=treinadorSupervisionado.trainUntilConvergence(dataset=trndata,maxEpochs=numeroDeEpocasPorPunto)

    max_anno = input.max(axis=0)[0]  
    min_anno = input.min(axis=0)[0]
    max_precip = output.max()
    min_precip = output.min()

    print("El primer año de la serie temporal disponible es:", min_anno)
    print("El ultimo año de la serie temporal disponible es:", max_anno)
    print("La máxima precipitación registrada en la serie temporal es:", max_precip)
    print("La mínima precipitación registrada en la serie temporal es:", min_precip)

    fig1 = plt.figure()
    ax1 = fig1.add_subplot(111)
    plt.xlabel('número de épocas')  
    plt.ylabel(u'Error')  
    plt.plot(trnerr,'b',valerr,'r')
    plt.show()

    treinadorSupervisionado.trainOnDataset(trndata,50)
    print(treinadorSupervisionado.totalepochs)
    out=n.activateOnDataset(tstdata).argmax(axis=1)
    print(percentError(out,tstdata))

    out=n.activateOnDataset(tstdata)
    out=out.argmax(axis=1)
    salida=n.activateOnDataset(validata)
    salida=salida.argmax(axis=1)
    print(percentError(salida,validata))

    print ('Pesos finales:', n.params)

    #Parametros de la RNA:

    for mod in n.modules:
        print("Module:", mod.name)
        if mod.paramdim > 0:
            print("--parameters:", mod.params)
        for conn in n.connections[mod]:
            print("-connection to", conn.outmod.name)
            if conn.paramdim > 0:
                print("- parameters", conn.params)
        if hasattr(n, "recurrentConns"):
            print("Recurrent connections")
            for conn in n.recurrentConns:
                print("-", conn.inmod.name, " to", conn.outmod.name)
                if conn.paramdim > 0:
                    print("- parameters", conn.params)

And this is the plot I get after running the code:这是我运行代码后得到的情节:

错误与时代

Where the blue line is the training error and the red line is the validation error.其中蓝线是训练误差,红线是验证误差。

This doesn't make any sense.这没有任何意义。 I have searched other questions, but I still don't know why I'm having this result.我已经搜索了其他问题,但我仍然不知道为什么我会得到这个结果。

My desired result is to predict, for example, the rainfall for each month in the following years , for example for 2010 (the series go from 1851 until 2008).我想要的结果是,例如,预测接下来几年每个月的降雨量,例如 2010 年(序列从 1851 年到 2008 年)。

After checking your dataset, I noticed that it's a time series data.检查您的数据集后,我注意到它是一个时间序列数据。 Usually using the time (month and year) as features doesn't work well in this case.在这种情况下,通常使用时间(月份和年份)作为特征效果不佳。

The most common architectures to predict time series are RNN and, its upgraded version, LSTM.预测时间序列的最常见架构是 RNN 及其升级版本 LSTM。 There is a nice tutorial on LSTM using Keras in http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ 中有一个关于 LSTM 使用 Keras 的很好的教程

I tried to train an LSTM (based on the tutorial) using your dataset and got better looking validation loss trend:我尝试使用您的数据集训练 LSTM(基于教程),并获得了更好看的验证损失趋势:

损失图

I trained LSTM (100 epoch) to predict a rainfall based on previous 12 months data:我训练了 LSTM(100 epoch)根据之前 12 个月的数据预测降雨量:

import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error


# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - 1):
        a = dataset[i:(i + look_back), 0]
        dataX.append(a)
        dataY.append(dataset[i + look_back, 0])
    return numpy.array(dataX), numpy.array(dataY)

# load the dataset
dataframe = pandas.read_csv('salidas.csv', usecols=[0], engine='python')
dataset = dataframe.values
dataset = dataset.astype('float32')

# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)

# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]

# reshape into X=t and Y=t+1
look_back = 12
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, validation_split=0.33, nb_epoch=100, batch_size=1)

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM