简体   繁体   English

循环神经网络实施

[英]Recurrent Neural Network implementation

I'am trying to implement a recurrent neural network in C, but it doesn't work. 我试图在C中实现循环神经网络,但它不起作用。 I have read some documents on Internet but I don't understand the complex mathematics. 我在网上看过一些文件,但我不懂复杂的数学。 So I have adapted the computation from a Multi-Layer Perceptron. 所以我调整了多层感知器的计算。

During a few steps of learning, the output of my network is a number, but soon after the output become "not a number" (-1,#IND00). 在几个学习步骤中,我的网络输出是一个数字,但输出后不久就变成“非数字”(-1,#IND00)。

1. Computation. 1.计算。

My first problem is the computation of values, errors, and weight changes. 我的第一个问题是计算值,错误和重量变化。

I compute the forward links, between two neurons N1->N2 , by the following way: 我通过以下方式计算两个神经元N1->N2之间的前向链路:

  • forward pass: (value of N2) += (value of N1) * (weight of link N1->N2) 正向通过:( (value of N2) += (value of N1) * (weight of link N1->N2)
  • backward pass: (error of N1) += (error of N2) * (weight of link N1->N2) and for output neurons (error) = (value of neuron) - (target output) 向后传递:( (error of N1) += (error of N2) * (weight of link N1->N2)和输出神经元(error) = (value of neuron) - (target output)
  • weight change: (new weight) = (old weight) - derivative(value of N2) * (error of N2) * (value of N1) * learning_rate 重量变化:( (new weight) = (old weight) - derivative(value of N2) * (error of N2) * (value of N1) * learning_rate

In each neuron, I store the previous value of the neuron, and the previous error of the neuron, computed during the previous forward and backward passes, because the recurrent links cannot be computed during the same passes than the forward links. 在每个神经元中,我存储神经元的先前值,以及在先前的前向和后向传递期间计算的神经元的先前误差,因为在与前向链路相同的传递期间不能计算循环链路。

I compute recurrent links, between two neurons N2->N1 , by the following way: 我通过以下方式计算两个神经元N2->N1之间的循环链接:

  • forward pass: (value of N1) += (previous value of N2) * (weight of link N2->N1) and the final value of N1, from all forward and recurrent links, is then passed in a sigmoid function (tanh) except for the output neurons 正向传递:( (value of N1) += (previous value of N2) * (weight of link N2->N1)和来自所有前向和循环链路的N1的最终值然后在sigmoid函数(tanh)中传递除了输出神经元
  • backward pass: (error of N2) += (previous error of N1) * (weight of link N2->N1) 向后传递:( (error of N2) += (previous error of N1) * (weight of link N2->N1)
  • weight change: (new weight) = (old weight) - derivative(value of N1) * (error of N1) * (previous value of N2) * learning_rate 重量变化:( (new weight) = (old weight) - derivative(value of N1) * (error of N1) * (previous value of N2) * learning_rate

I don't know if this computation is correct and can result in a working network. 我不知道这个计算是否正确并且可能导致工作网络。

2. Multi-Layer Perceptron. 2.多层感知器。

My second problem is when I disable the computation of recurrent links, my network should work like a multi-layer perceptron. 我的第二个问题是当我禁用循环链接的计算时,我的网络应该像多层感知器一样工作。 But, even if my network learn, it has bad performances and needs in average more training cycles than a multi-layer perceptron I have found on Internet. 但是,即使我的网络学习,它的性能也很差,平均需要的训练周期比我在互联网上找到的多层感知器要多。 So there is something wrong in my implementation. 所以我的实施有问题。

There is also a strange phenomenon, I backpropagate error by multiplying the error of a neuron with the weight of the incomming link/synapse, and add this product to the error of the neuron before the link/synapse. 还有一个奇怪的现象,我通过将神经元的误差乘以进入链接/突触的权重来反向传播错误,并将此产品添加到链接/突触之前的神经元的错误中。 It is like in the perceptron I have found on Internet. 就像我在互联网上发现的感知器一样。 But when I add the error without multiplying it with the weight, the network works better and have performances of the same order than the perceptron found on Internet. 但是当我添加错误而不将其与权重相乘时,网络效果更好,并且具有与Internet上的感知器相同的顺序的性能。

3. Source code. 3.源代码。

Here is the source code, the first file is my implementation with the main function that tests my network versus the multi-layer perceptron I have found on Internet. 这是源代码,第一个文件是我的实现,主要功能是测试我的网络,而不是我在互联网上找到的多层感知器。 The perceptron is defined in the second file: mlp.h. 感知器在第二个文件中定义:mlp.h。

The computation of recurrent links is disabled, so it should work like a multi-layer perceptron. 重复链路的计算被禁用,因此它应该像多层感知器一样工作。 If you don't want to read the whole code, you can just have a look to the functions rnnset() , rnnsetstart() and rnnlearn() , to see the forward and backward passes, and it is in these 3 functions that the recurrent links are disabled (commented lines/blocks). 如果您不想阅读整个代码,您可以查看函数rnnset()rnnsetstart()rnnlearn() ,以查看前向和后向传递,并且在这三个函数中,禁用循环链接(注释行/块)。 rnnsetstart() must be called before rnnset() , in order to store the value of the last forward pass in the neuron variable value_prev . rnnsetstart()必须之前调用rnnset()以便存储在该神经元变量中的最后直传的值value_prev

#include <stdio.h>
#include <time.h>
#include <math.h>
#include <malloc.h>
#include <stdlib.h>

#include "mlp.h"


typedef struct _neuron NEURON;
struct _neuron {
  int layer;

  double * weight;
  int nbsynapsesin;
  NEURON ** synapsesin;
  double bias;

  double value;
  double value_prev;
  double error;
  double error_prev;
};

typedef struct _rnn RNN;
struct _rnn {
  int * layersize;

  int nbneurons;
  NEURON * n;
};

typedef struct _config CONFIG;
struct _config {
  int nbneurons;
  int * layersize;
  int nbsynapses;
  int * synapses;
};




CONFIG * createconfig(int * layersize) {
  CONFIG * conf = (CONFIG*)malloc(sizeof(CONFIG));
  int i;
  conf->nbneurons = 0;
  for(i=1; i<layersize[0]+1; i++) conf->nbneurons += layersize[i];
  conf->layersize = (int*)malloc((layersize[0]+1)*sizeof(int));
  for(i=0; i<layersize[0]+1; i++) conf->layersize[i] = layersize[i];

  conf->nbsynapses = 0;
  for(i=1; i<layersize[0]; i++) conf->nbsynapses += layersize[i] * layersize[i+1]; 
  conf->nbsynapses *= 2;

  conf->synapses = (int*)malloc(2*conf->nbsynapses*sizeof(int));

  // creation of the synapses:
  int j,k=0,l,k2=0,k3=0;
  for(i=1;i<layersize[0];i++) {
    k3 += layersize[i];
    for(j=0; j<layersize[i]; j++) { 
      for(l=0; l<layersize[i+1]; l++) {
        // forward link/synapse:
        conf->synapses[k] = k2+j;
        k++;
        conf->synapses[k] = k3+l;
        k++;
        // Recurrent link/synapse:
        conf->synapses[k] = k3+l;
        k++;
        conf->synapses[k] = k2+j;
        k++;

      }
    }
    k2 += layersize[i];
  }
  return conf;
}

void freeconfig(CONFIG* conf) {
  free(conf->synapses);
  free(conf->layersize);
  free(conf);
}





RNN * creaternn(CONFIG * conf) {

  RNN * net = (RNN*)malloc(sizeof(RNN));
  net->nbneurons = conf->nbneurons;
  net->layersize = (int*)malloc((conf->layersize[0]+1)*sizeof(int));
  int i;
  for(i=0; i<conf->layersize[0]+1; i++) net->layersize[i] = conf->layersize[i];

  net->n = (NEURON*)malloc(conf->nbneurons*sizeof(NEURON));

  int j=0,k=0;
  for(i=0; i<conf->nbneurons; i++) {
    if(k==0) { k = conf->layersize[j+1]; j++; }
    net->n[i].layer = j-1;
    net->n[i].nbsynapsesin = 0; 
    k--;
  }

  k=0;
  for(i=0; i<conf->nbsynapses; i++) {
    k++;
    net->n[conf->synapses[k]].nbsynapsesin++;
    k++;
  }

  for(i=0; i<conf->nbneurons; i++) {
    net->n[i].weight = (double*)malloc(net->n[i].nbsynapsesin*sizeof(double));
    net->n[i].synapsesin = (NEURON**)malloc(net->n[i].nbsynapsesin*sizeof(NEURON*));
    net->n[i].nbsynapsesin = 0;
  }

  // Link the incoming synapses with the neurons:
  k=0;
  for(i=0; i<conf->nbsynapses; i++) {
    k++;
    net->n[conf->synapses[k]].synapsesin[net->n[conf->synapses[k]].nbsynapsesin] = &(net->n[conf->synapses[k-1]]);
    net->n[conf->synapses[k]].nbsynapsesin++;
    k++;
  }

  // Initialization of the values, errors, and weights:
  for(i=0; i<net->nbneurons; i++) {
    for(j=0; j<net->n[i].nbsynapsesin; j++) {
      net->n[i].weight[j] = 1.0 * (double)rand() / RAND_MAX - 1.0/2;
    }
    net->n[i].bias = 1.0 * (double)rand() / RAND_MAX - 1.0/2;
    net->n[i].value = 0.0;
    net->n[i].value_prev = 0.0;
    net->n[i].error_prev = 0.0;
    net->n[i].error = 0.0;
  }

  return net;
}


void freernn(RNN * net) {
  int i;
  for(i=0; i<net->nbneurons; i++) {
    free(net->n[i].weight);
    free(net->n[i].synapsesin);
  }
  free(net->n);
  free(net->layersize);
  free(net);
}

void rnnget(RNN * net, double * out) {
  int i,k=0;
  for(i=net->nbneurons-1; i>net->nbneurons-net->layersize[net->layersize[0]]-1; i--) { out[k] = net->n[i].value; k++; }
}

void rnnset(RNN * net, double * in) {
  int i,j,k;
  double v;

  NEURON *ni,*nj;
  // For each neuron:
  for(i=0; i<net->nbneurons; i++) {
    ni = &(net->n[i]);
    if(i<net->layersize[1]) ni->value = in[i]; else ni->value = ni->bias;
    // For each incoming synapse:
    for(j=0; j<ni->nbsynapsesin; j++) {
      nj = ni->synapsesin[j];
      // If it is a forward link/synapse:
      if(ni->layer > nj->layer) ni->value +=  nj->value * ni->weight[j];
      // Uncomment the following line to activate reccurent links computation:
      //else ni->value += nj->value_prev * ni->weight[j];
    }
    // If NOT the output layer, then tanh the value:
    if(ni->layer != net->layersize[0]-1) ni->value = tanh(ni->value);
  }
}

void rnnsetstart(RNN * net) {
  int i,j;

  NEURON *ni,*nj;
  // For each neuron, update value_prev:
  for(i=0; i<net->nbneurons; i++) {
    ni = &(net->n[i]);
    // If NOT the output layer, then the value is already computed by tanh:
    if(ni->layer != net->layersize[0]-1) {
      ni->value_prev = ni->value;
    } else {
      ni->value_prev = tanh(ni->value);
    }
  }
}

void rnnlearn(RNN * net, double * out, double learningrate) {
  int i,j,k;
  k=0;

  NEURON *ni,*nj;
  // Initialize error to zero for the output layer:
  for(i=net->nbneurons-1; i>=net->nbneurons-net->layersize[net->layersize[0]]; i--) net->n[i].error = 0.0;

  // Compute the error for output neurons:
  for(i=net->nbneurons-1; i>=0; i--) {
    ni = &(net->n[i]);
    // If ni is an output neuron, update the error:
    if(ni->layer == net->layersize[0]-1) {
      ni->error += ni->value - out[k];
      k++;
    } else {
      ni->error = 0.0;
    }
    // Uncomment the following block to activate reccurent links computation:
    /*
    // For each incoming synapse from output layer:
    for(j=0; j<ni->nbsynapsesin; j++) {
      nj = ni->synapsesin[j];
      // If neuron nj is in output layer, then update the error:
      if(nj->layer == net->layersize[0]-1) nj->error += ni->error_prev * ni->weight[j];
    }
    */
  }

  // Compute error for all other neurons:
  for(i=net->nbneurons-1; i>=0; i--) {
    ni = &(net->n[i]);
    // For each input synapse NOT from output layer:
    for(j=0; j<ni->nbsynapsesin; j++) {
      nj = ni->synapsesin[j];
      // If neuron nj is NOT in output layer, then update the error:
      if(nj->layer != net->layersize[0]-1) {
        // If it is a forward link/synapse:
        if(ni->layer > nj->layer) nj->error += ni->error * ni->weight[j];
        // Uncomment the following line to activate reccurent links computation:
        //else nj->error += ni->error_prev * ni->weight[j];
      }
    }
  }

  // Update weights:
  for(i=0; i<net->nbneurons; i++) {
    ni = &(net->n[i]);
    double wchange,derivative;

    // For the output layer:
    if(ni->layer == net->layersize[0]-1) {
      derivative = ni->error * learningrate;
      // For each incoming synapse:
      for(j=0; j<ni->nbsynapsesin; j++) {
        nj = ni->synapsesin[j];
        wchange = derivative;
        // If it is a forward link/synapse:
        if(ni->layer > nj->layer) wchange *= nj->value;
        else wchange *= nj->value_prev;
        ni->weight[j] -= wchange;
        if(ni->weight[j] > 5) ni->weight[j] = 5;
        if(ni->weight[j] < -5) ni->weight[j] = -5;
      }
      ni->bias -= derivative;
      if(ni->bias > 5) ni->bias = 5;
      if(ni->bias < -5) ni->bias = -5;

    // For the other layers:
    } else {
      derivative = 1.0 - ni->value * ni->value;
      derivative *= ni->error * learningrate;
      // For each incoming synapse:
      for(j=0; j<ni->nbsynapsesin; j++) {
        nj = ni->synapsesin[j];
        wchange = derivative;
        // If it is a forward link/synapse:
        if(ni->layer > nj->layer) wchange *= nj->value;
        else wchange *= nj->value_prev;
        ni->weight[j] -= wchange;
      }
      ni->bias -= derivative;
    }
  }

  // Update error_prev:
  for(i=0; i<net->nbneurons; i++) net->n[i].error_prev = net->n[i].error;
}




int main() {
  srand(time(NULL));
  int layersize[] = {1, 25, 12, 1};
  int layersize_netrnn[] = { 4, 1, 25, 12, 1 };

  mlp * netmlp = create_mlp (4, layersize);
  CONFIG * configrnn = createconfig(layersize_netrnn);
  RNN * netrnn = creaternn(configrnn);

  double inc,outc;

  double global_error = 1;
  double global_error2 = 1;

  int iter,i1=0,i2=0;


  //////////////////////////////////////////////////////
  // Training of the Multi-Layer Perceptron:
  //////////////////////////////////////////////////////
  while(global_error > 0.005 && i1<1000) {

    for (iter=0; iter < 100; iter++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      set_mlp(netmlp,&inc);
      learn_mlp(netmlp,&outc,0.03);
    }

    global_error = 0;
    int k;
    for (k=0; k < 100; k++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      set_mlp(netmlp,&inc);
      get_mlp(netmlp,&outc);
      mlp_float desired_out = inc*inc;
      global_error += (desired_out - outc)*(desired_out - outc);
    }         
    global_error /= 100;
    global_error = sqrt(global_error);

    i1++;
  }

  //////////////////////////////////////////////////////
  // Training of the Recurrent Neural Network:
  //////////////////////////////////////////////////////
  while(global_error2 > 0.005 && i2<1000) {

    for (iter=0; iter < 100; iter++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      rnnsetstart(netrnn);
      rnnset(netrnn,&inc);
      double outc2;

      rnnlearn(netrnn,&outc,0.03);
    }

    global_error2 = 0;
    int k;
    for (k=0; k < 100; k++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      double desired_out = inc*inc;

      rnnsetstart(netrnn);
      rnnset(netrnn,&inc);
      rnnget(netrnn,&outc);

      global_error2 += (desired_out - outc)*(desired_out - outc);
    }
    global_error2 /= 100;
    global_error2 = sqrt(global_error2);
    if(!isnormal(global_error2)) global_error2 = 100;
    i2++;
  }

  //////////////////////////////////////////////////////
  // Test of performance for the both networks:
  //////////////////////////////////////////////////////
  global_error = 0;
  global_error2 = 0;

  int k;
  for (k=0; k < 10000; k++) {
    inc = 1.0*rand()/(RAND_MAX+1.0);
    outc = inc*inc;
    double desired_out = inc*inc;

    rnnsetstart(netrnn);
    rnnset(netrnn,&inc);
    rnnget(netrnn,&outc);
    global_error2 += (desired_out - outc)*(desired_out - outc);

    set_mlp(netmlp,&inc);
    get_mlp(netmlp,&outc);
    global_error += (desired_out - outc)*(desired_out - outc);
  }

  global_error /= 10000;
  global_error = sqrt(global_error);
  printf("\n  MLP: i: %5d    error: %f",i1,global_error);
  global_error2 /= 10000;
  global_error2 = sqrt(global_error2);
  printf("\n  RNN: i: %5d    error: %f",i2,global_error2);

  free_mlp(netmlp);
  freeconfig(configrnn);
  freernn(netrnn);

}

And the file mlp.h: 和文件mlp.h:

typedef double mlp_float;

typedef struct {
    mlp_float *synaptic_weight;
    mlp_float *neuron_value;
    mlp_float *neuron_error_value;
    mlp_float *input_neuron;
    mlp_float *output_neuron;
    mlp_float *output_error_value;
    int *layer_index;
    int *layer_size;
    int *synapse_index;
    int layer_number;
    int neuron_number;
    int synapse_number;
    int input_layer_size;
    int output_layer_size;
} mlp;

static mlp_float MAGICAL_WEIGHT_NUMBER = 1.0f;
static mlp_float MAGICAL_LEARNING_NUMBER = 0.4f;

void reinit_mlp(mlp * network) {
int i;
for (i = 0; i < network->synapse_number; i++) {
        network->synaptic_weight[i] = /*0.001;*/MAGICAL_WEIGHT_NUMBER * (mlp_float)rand() / RAND_MAX - MAGICAL_WEIGHT_NUMBER/2;
}
}

mlp *create_mlp(int layer_number, int *layer_size) {

    mlp *network = (mlp*)malloc(sizeof * network);

    network->layer_number = layer_number;
    network->layer_size = (int*)malloc(sizeof * network->layer_size * network->layer_number);
    network->layer_index = (int*)malloc(sizeof * network->layer_index * network->layer_number);

    int i;
    network->neuron_number = 0;
    for (i = 0; i < layer_number; i++) {
        network->layer_size[i] = layer_size[i];
        network->layer_index[i] = network->neuron_number;
        network->neuron_number += layer_size[i];
    }

    network->neuron_value = (mlp_float*)malloc(sizeof * network->neuron_value * network->neuron_number);
    network->neuron_error_value = (mlp_float*)malloc(sizeof * network->neuron_error_value * network->neuron_number);

    network->input_layer_size = layer_size[0];
    network->output_layer_size = layer_size[layer_number-1];
    network->input_neuron = network->neuron_value;
    network->output_neuron = &network->neuron_value[network->layer_index[layer_number-1]];
    network->output_error_value = &network->neuron_error_value[network->layer_index[layer_number-1]];

    network->synapse_index = (int*)malloc(sizeof * network->synapse_index * (network->layer_number-1));
    network->synapse_number = 0;
    for (i = 0; i < layer_number - 1; i++) {
        network->synapse_index[i] = network->synapse_number;
        network->synapse_number += (network->layer_size[i]+1) * network->layer_size[i+1];
    }

    network->synaptic_weight = (mlp_float*)malloc(sizeof * network->synaptic_weight * network->synapse_number);


    for (i = 0; i < network->synapse_number; i++) {
        network->synaptic_weight[i] = MAGICAL_WEIGHT_NUMBER * (mlp_float)rand() / RAND_MAX - MAGICAL_WEIGHT_NUMBER/2;
    }
    return network;
}

void free_mlp (mlp *network) {
    free(network->layer_size);
    free(network->layer_index);
    free(network->neuron_value);
    free(network->neuron_error_value);
    free(network->synapse_index);
    free(network->synaptic_weight);
    free(network);
}

void set_mlp (mlp * network, mlp_float *vector) {
    if (vector != NULL) {
        int i;
        for (i = 0; i < network->input_layer_size; i++) {
            network->input_neuron[i] = vector[i];
        }
    }
    int i;
    int synapse_index;
    synapse_index = 0;
    for (i = 1; i < network->layer_number; i++) {
        int j;
        for (j = network->layer_index[i]; j < network->layer_index[i] + network->layer_size[i]; j++) {
            mlp_float weighted_sum = 0.0;
            int k;
            for (k = network->layer_index[i-1]; k < network->layer_index[i-1] + network->layer_size[i-1]; k++) {
                weighted_sum += network->neuron_value[k] * network->synaptic_weight[synapse_index];
                synapse_index++;
            }
            weighted_sum += network->synaptic_weight[synapse_index];

            synapse_index++;
            network->neuron_value[j] = weighted_sum;
            if (i != network->layer_number - 1) network->neuron_value[j] = tanh(network->neuron_value[j]);
        }
    }
}

void get_mlp (mlp *network, mlp_float *vector) {
    int i;
    for (i = 0; i < network->output_layer_size; i++) {
        vector[i] = network->output_neuron[i];
    }
}

void learn_mlp (mlp *network, mlp_float *desired_out, mlp_float learning_rate) {

    int i;
    mlp_float global_error = 0;
    int synapse_index = network->synapse_index[network->layer_number-2];

    for (i = 0; i < network->output_layer_size; i++) {
        network->output_error_value[i] = network->output_neuron[i] - desired_out[i];
        int j;
        for (j = network->layer_index[network->layer_number-2]; j < network->layer_index[network->layer_number-2] + network->layer_size[network->layer_number-2]; j++) {
            mlp_float weightChange;
            weightChange = learning_rate * network->output_error_value[i] * network->neuron_value[j];
            network->synaptic_weight[synapse_index] -= weightChange;

            if (network->synaptic_weight[synapse_index] > 5) network->synaptic_weight[synapse_index] = 5;
            if (network->synaptic_weight[synapse_index] < -5) network->synaptic_weight[synapse_index] = -5;

            synapse_index++;
        }
        mlp_float weightChange;
        weightChange = learning_rate * network->output_error_value[i];
        network->synaptic_weight[synapse_index] -= weightChange;

        if (network->synaptic_weight[synapse_index] > 5) network->synaptic_weight[synapse_index] = 5;
        if (network->synaptic_weight[synapse_index] < -5) network->synaptic_weight[synapse_index] = -5;

        synapse_index++;
    }

    for (i = network->layer_number - 2; i > 0; i--) {
        int j;
        int jj= 0;
        int synapse_index = network->synapse_index[i-1];
        for (j = network->layer_index[i]; j < network->layer_index[i] + network->layer_size[i]; j++,jj++) {
            int k;

            int synapse_index2 = network->synapse_index[i] + jj;
            network->neuron_error_value[j] = 0;
            for (k = network->layer_index[i+1]; k < network->layer_index[i+1] + network->layer_size[i+1]; k++) {
                network->neuron_error_value[j] += network->synaptic_weight[synapse_index2] * network->neuron_error_value[k];
                synapse_index2+=network->layer_size[i]+1;
            }

            for (k = network->layer_index[i-1]; k < network->layer_index[i-1] + network->layer_size[i-1]; k++) {

                mlp_float weightChange;

                weightChange = 1.0 - network->neuron_value[j] * network->neuron_value[j];
                weightChange *= network->neuron_error_value[j] * learning_rate;
                weightChange *= network->neuron_value[k];

                network->synaptic_weight[synapse_index] -= weightChange;
                synapse_index++;
            }
            mlp_float weightChange;

            weightChange = 1.0 - network->neuron_value[j] * network->neuron_value[j];
            weightChange *= network->neuron_error_value[j] * learning_rate;

            network->synaptic_weight[synapse_index] -= weightChange;

            synapse_index++;


        }
    }
}




void get_mlp_inputs (mlp *network, mlp_float *vector) {
    if (vector != NULL) {
        int i;
        for (i = 0; i < network->input_layer_size; i++) {
            vector[i] = network->input_neuron[i];
        }
    }
}

About the computation of recurrent links, I finally found a document . 关于循环链接的计算,我终于找到了一个文档 If I have well understood, I should compute recurrent links between two neurons N1<-N2 , by the following way: 如果我已经很好理解,我应该通过以下方式计算两个神经元N1<-N2之间的循环链接:

  • forward pass: (value of N1) += (previous value of N2) * (weight of link N1<-N2) 正向通过:( (value of N1) += (previous value of N2) * (weight of link N1<-N2)
  • backward pass: No error backpropagation through recurrent links 向后传递: No error backpropagation through recurrent links
  • weight change: (new weight) = (old weight) - derivative(value of N1) * (error of N1) * (previous value of N2) * learning_rate 重量变化:( (new weight) = (old weight) - derivative(value of N1) * (error of N1) * (previous value of N2) * learning_rate

Now my recurrent network learn, but it has bad performances compared to the Multi-Layer Perceptron. 现在我的经常性网络学习,但与多层感知器相比,它的性能不佳。 May be it is due to the problem I try to solve, that isn't suitable for recurrent networks. 可能是由于我试图解决的问题,这不适合经常性网络。

About my second problem, I finally found the bug, I computed the value of input neurons using tanh , but the value of input neurons shouldn't be changed. 关于我的第二个问题,我终于发现了这个bug,我用tanh计算了输入神经元的值,但输入神经元的值不应该改变。

Here is the correct source code: 这是正确的源代码:

#include <stdio.h>
#include <time.h>
#include <math.h>
#include <malloc.h>
#include <stdlib.h>

#include "mlp.h"


typedef struct _neuron NEURON;
struct _neuron {
  int layer;

  double * weight;      // table of weights for incoming synapses
  int nbsynapsesin;     // number of incoming synapses

  NEURON ** synapsesin; // table of pointer to the neurons from
                        // which are coming the synapses
  double bias;

  double value;
  double value_prev;
  double error;
  double error_prev;
};

typedef struct _rnn RNN;
struct _rnn {
  int * layersize;

  int nbneurons;
  NEURON * n;
};

typedef struct _config CONFIG;
struct _config {
  int nbneurons;
  int * layersize;
  int nbsynapses;
  int * synapses;
};




CONFIG * createconfig(int * layersize) {
  CONFIG * conf = (CONFIG*)malloc(sizeof(CONFIG));
  int i;
  conf->nbneurons = 0;
  for(i=1; i<layersize[0]+1; i++) conf->nbneurons += layersize[i];
  conf->layersize = (int*)malloc((layersize[0]+1)*sizeof(int));
  for(i=0; i<layersize[0]+1; i++) conf->layersize[i] = layersize[i];

  // Compute the number of synapses:
  conf->nbsynapses = 0;
  for(i=1; i<layersize[0]; i++) conf->nbsynapses += layersize[i] * layersize[i+1]; 
  conf->nbsynapses *= 2;

  // Allocate the table of synapses:
  conf->synapses = (int*)malloc(2*conf->nbsynapses*sizeof(int));

  // creation of the synapses:
  int j,k=0,l,k2=0,k3=0;
  for(i=1;i<layersize[0];i++) {
    k3 += layersize[i];
    for(j=0; j<layersize[i]; j++) { 
      for(l=0; l<layersize[i+1]; l++) {
        // forward link/synapse:
        conf->synapses[k] = k2+j;
        k++;
        conf->synapses[k] = k3+l;
        k++;
        // Recurrent link/synapse:
        conf->synapses[k] = k3+l;
        k++;
        conf->synapses[k] = k2+j;
        k++;

      }
    }
    k2 += layersize[i];
  }
  return conf;
}

void freeconfig(CONFIG* conf) {
  free(conf->synapses);
  free(conf->layersize);
  free(conf);
}





RNN * creaternn(CONFIG * conf) {

  RNN * net = (RNN*)malloc(sizeof(RNN));
  net->nbneurons = conf->nbneurons;
  net->layersize = (int*)malloc((conf->layersize[0]+1)*sizeof(int));
  int i;
  for(i=0; i<conf->layersize[0]+1; i++) net->layersize[i] = conf->layersize[i];

  // Allocate the neuron table of the Recurrent Neural Network:
  net->n = (NEURON*)malloc(conf->nbneurons*sizeof(NEURON));

  // Initialize some neuron values:
  int j=0,k=0;
  for(i=0; i<conf->nbneurons; i++) {
    if(k==0) { k = conf->layersize[j+1]; j++; }
    net->n[i].layer = j-1;
    net->n[i].nbsynapsesin = 0; 
    k--;
  }

  // Count the incoming synapses for each neuron:
  k=0;
  for(i=0; i<conf->nbsynapses; i++) {
    k++;
    net->n[conf->synapses[k]].nbsynapsesin++;
    k++;
  }

  // Allocate weight table in neurons, and the table of pointer to neuron
  // that represent the incoming synapses:
  for(i=0; i<conf->nbneurons; i++) {
    net->n[i].weight = (double*)malloc(net->n[i].nbsynapsesin*sizeof(double));
    net->n[i].synapsesin = (NEURON**)malloc(net->n[i].nbsynapsesin*sizeof(NEURON*));
    net->n[i].nbsynapsesin = 0;
  }

  // Link the incoming synapses with the neurons:
  k=0;
  for(i=0; i<conf->nbsynapses; i++) {
    k++;
    net->n[conf->synapses[k]].synapsesin[net->n[conf->synapses[k]].nbsynapsesin] = &(net->n[conf->synapses[k-1]]);
    net->n[conf->synapses[k]].nbsynapsesin++;
    k++;
  }

  // Initialization of the values, errors, and weights:
  for(i=0; i<net->nbneurons; i++) {
    for(j=0; j<net->n[i].nbsynapsesin; j++) {
      net->n[i].weight[j] = 1.0 * (double)rand() / RAND_MAX - 1.0/2;
    }
    net->n[i].bias = 1.0 * (double)rand() / RAND_MAX - 1.0/2;
    net->n[i].value = 0.0;
    net->n[i].value_prev = 0.0;
    net->n[i].error_prev = 0.0;
    net->n[i].error = 0.0;
  }

  return net;
}


void freernn(RNN * net) {
  int i;
  for(i=0; i<net->nbneurons; i++) {
    free(net->n[i].weight);
    free(net->n[i].synapsesin);
  }
  free(net->n);
  free(net->layersize);
  free(net);
}

void rnnget(RNN * net, double * out) {
  int i,k=0;
  // Store the output of the network in the variable table "out":
  for(i=net->nbneurons-1; i>=(net->nbneurons - net->layersize[net->layersize[0]]); i--) { out[k] = net->n[i].value; k++; }
}

void rnnsetstart(RNN * net) {
  int i,j;

  NEURON *ni,*nj;
  // For each neuron, update value_prev:
  for(i=0; i<net->nbneurons; i++) {
    ni = &(net->n[i]);
    // If NOT the output layer, then the value is already computed by tanh:
    if(ni->layer != net->layersize[0]-1) ni->value_prev = ni->value;
    else ni->value_prev = tanh(ni->value);
  }
}

void rnnset(RNN * net, double * in) {
  int i,j,k;
  double v;

  NEURON *ni,*nj;
  // For each neuron:
  for(i=0; i<net->nbneurons; i++) {
    ni = &(net->n[i]);
    // If it is an input neuron:
    if(i<net->layersize[1]) ni->value = in[i];
    else ni->value = ni->bias;

    // If the neuron is NOT in input layer, then  
    // compute the value from the incoming synapses:
    if(i>=net->layersize[1]) {
      // For each incoming synapse:
      for(j=0; j<ni->nbsynapsesin; j++) {
        nj = ni->synapsesin[j];
        // If the synapse is from input layer to output layer, then tanh the value:
        if(nj->layer == 0 && ni->layer == (net->layersize[0]-1)) {
          ////////////////////////////////////////////////////////////////////////
          // Uncomment the following line to enable reccurent links computation:
          ni->value += tanh(nj->value_prev) * ni->weight[j];
          ////////////////////////////////////////////////////////////////////////
        } else {
          // If it is a forward link/synapse:
          if(ni->layer > nj->layer) ni->value +=  nj->value * ni->weight[j];
          ////////////////////////////////////////////////////////////////////////
          // Uncomment the following line to enable reccurent links computation:
          else ni->value += nj->value_prev * ni->weight[j];
          ////////////////////////////////////////////////////////////////////////
        }
      }
    }
    // If NOT the input layer NOR the output layer, then tanh the value:
    if(ni->layer != 0 && ni->layer != net->layersize[0]-1) ni->value = tanh(ni->value);
  }
}


void rnnlearnstart(RNN * net) {
  int i;
  // For each neuron, initialize error_prev and value_prev for a
  // new training cycle:
  for(i=0; i<net->nbneurons; i++) { net->n[i].error_prev = 0.0; net->n[i].value_prev = 0.0; }
}

void rnnlearn(RNN * net, double * out, double learningrate) {
  int i,j,k;
  k=0;

  NEURON *ni,*nj;
  // Initialize error to zero for the output layer:
  for(i=net->nbneurons-1; i>=net->nbneurons-net->layersize[net->layersize[0]]; i--) net->n[i].error = 0.0;

  // Compute the error for output neurons, and 
  // initialize it to 0 for the other neurons:
  for(i=net->nbneurons-1; i>=0; i--) {
    ni = &(net->n[i]);
    // If ni is an output neuron, update the error:
    if(ni->layer == net->layersize[0]-1) {
      ni->error += ni->value - out[k];
      k++;
    } else {
      ni->error = 0.0;
    }
  }

  // Compute error for all other neurons:
  for(i=net->nbneurons-1; i>=0; i--) {
    ni = &(net->n[i]);
    // For each incoming synapse NOT from output layer:
    for(j=0; j<ni->nbsynapsesin; j++) {
      nj = ni->synapsesin[j];
      // If it is a forward link/synapse:
      if(ni->layer > nj->layer) nj->error += ni->error * ni->weight[j];
    }
  }

  // Update weights:
  for(i=0; i<net->nbneurons; i++) {
    ni = &(net->n[i]);
    double wchange,derivative;
    // For the output layer:
    if(ni->layer == net->layersize[0]-1) {
      derivative = ni->error * learningrate;
      // For each incoming synapse:
      for(j=0; j<ni->nbsynapsesin; j++) {
        nj = ni->synapsesin[j];
        wchange = derivative;
        // If it is a forward link/synapse:
        if(ni->layer > nj->layer) wchange *= nj->value;
        else wchange *= nj->value_prev;
        ni->weight[j] -= wchange;
        if(ni->weight[j] > 5) ni->weight[j] = 5;
        if(ni->weight[j] < -5) ni->weight[j] = -5;
      }
      ni->bias -= derivative;
      if(ni->bias > 5) ni->bias = 5;
      if(ni->bias < -5) ni->bias = -5;

    // For the other layers:
    } else {
      derivative = 1.0 - ni->value * ni->value;
      derivative *= ni->error * learningrate;
      // For each incoming synapse:
      for(j=0; j<ni->nbsynapsesin; j++) {
        nj = ni->synapsesin[j];
        wchange = derivative;
        // If it is a forward link/synapse:
        if(ni->layer > nj->layer) wchange *= nj->value;
        else wchange *= nj->value_prev;
        ni->weight[j] -= wchange;
      }
      ni->bias -= derivative;
    }
  }

  // Update error_prev:
  for(i=0; i<net->nbneurons; i++) net->n[i].error_prev = net->n[i].error;
}

int main() {
  srand(time(NULL));
  int layersize[] = {1, 25, 12, 1};
  int layersize_netrnn[] = { 4, 1, 25, 12, 1 };

  mlp * netmlp = create_mlp (4, layersize);
  srand(time(NULL));
  CONFIG * configrnn = createconfig(layersize_netrnn);
  RNN * netrnn = creaternn(configrnn);

  double inc,outc;

  double global_error = 1;
  double global_error2 = 1;

  int iter,i1=0,i2=0;


  //////////////////////////////////////////////////////
  // Training of the Multi-Layer Perceptron:
  //////////////////////////////////////////////////////
  while(global_error > 0.005 && i1<1000) {

    for (iter=0; iter < 100; iter++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      set_mlp(netmlp,&inc);
      learn_mlp(netmlp,&outc,0.03);
    }

    global_error = 0;
    int k;
    for (k=0; k < 100; k++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      set_mlp(netmlp,&inc);
      get_mlp(netmlp,&outc);
      mlp_float desired_out = inc*inc;
      global_error += (desired_out - outc)*(desired_out - outc);
    }         
    global_error /= 100;
    global_error = sqrt(global_error);

    i1++;
  }

  //////////////////////////////////////////////////////
  // Training of the Recurrent Neural Network:
  //////////////////////////////////////////////////////
  while(global_error2 > 0.005 && i2<1000) {

    rnnlearnstart(netrnn);

    for (iter=0; iter < 100; iter++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      rnnsetstart(netrnn);
      rnnset(netrnn,&inc);
      double outc2;

      rnnlearn(netrnn,&outc,0.03);
    }

    global_error2 = 0;
    int k;
    for (k=0; k < 100; k++) {
      inc = 1.0*rand()/(RAND_MAX+1.0);
      outc = inc*inc;
      double desired_out = inc*inc;

      rnnsetstart(netrnn);
      rnnset(netrnn,&inc);
      rnnget(netrnn,&outc);

      global_error2 += (desired_out - outc)*(desired_out - outc);
    }
    global_error2 /= 100;
    global_error2 = sqrt(global_error2);
    if(!isnormal(global_error2)) global_error2 = 100;
    i2++;
  }

  //////////////////////////////////////////////////////
  // Test of performance for the both networks:
  //////////////////////////////////////////////////////
  global_error = 0;
  global_error2 = 0;

  int k;
  for (k=0; k < 10000; k++) {
    inc = 1.0*rand()/(RAND_MAX+1.0);
    outc = inc*inc;
    double desired_out = inc*inc;

    rnnsetstart(netrnn);
    rnnset(netrnn,&inc);
    rnnget(netrnn,&outc);
    global_error2 += (desired_out - outc)*(desired_out - outc);

    set_mlp(netmlp,&inc);
    get_mlp(netmlp,&outc);
    global_error += (desired_out - outc)*(desired_out - outc);
  }

  global_error /= 10000;
  global_error = sqrt(global_error);
  printf("\n  MLP:    Training cycles: %5d    Error: %f",i1,global_error);
  global_error2 /= 10000;
  global_error2 = sqrt(global_error2);
  printf("\n  RNN:    Training cycles: %5d    Error: %f",i2,global_error2);

  free_mlp(netmlp);
  freeconfig(configrnn);
  freernn(netrnn);

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM