简体   繁体   中英

Trouble with understanding how TensorFlow receives and processes data

I have recently begun studying Deep Learning and am confident of my understanding of both the theory and the in-depth practical implementations of RNNs and LSTMs. I have written a very simple RNN which learns to add together two binary numbers, using only numpy. I am now trying to become familiar with the TensorFlow API so that I no longer have to build my models from scratch.

Despite my confidence in my understanding of NNs and in my programming ability, I am becoming very frustrated as I keep hitting walls when it comes to understanding the high-level that TensorFlow abstracts the models, and how data to be used should be structured. An example of a wall I have hit is in the code below, where I am attempting to implement a simple RNN that takes in a list of lists/sequences of integers, and will then learn how to classify a single sequence as either increasing or decreasing. generate_data() outputs two lists:

  • data is in the form on [[1, 2, 3], [9, 8, 7]] and is the input sequences.
  • labels is a list of either 1 s or 0 s - a 1 meaning the corresponding sequence is increasing, and a 0 for decreasing.

x is the placeholder for the input sequences and y is the placeholder for the corresponding labels. My thought process is for each input sequence to be received by the RNN as x , a single column tensor with each row being a single integer of the sequence - a single time step in the unrolled RNN. The RNN will then output a single integr ( 0 or 1 ) after each full forward propogation of the RNN (after one entire x tensor has been processed.

I am getting an error that on the last line that inputs must be a sequence. I am failing to understand how this single column tensor is not considered a sequence, and how it needs to be shaped in order for it to be a sequence.

As a side note, the next biggest misunderstanding I have is that in all the theoretical explanations I have read of RNNs, there are 3 weighted matrices - one from input to hidden state, one from hidden state to output, and one between the hidden states of each time step. All coded examples I have seen using TensorFlow only seem to have a single weighted matrix. How is this so? How does TensorFlow use this single matrix as an abstraction of the 3 deep levels ones? And am I shaping this matrix correctly in the line W = tf.Variable(tf.random_normal([sequence_len, output_dim])) ?

from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib import rnn
import random

sequence_len = 5        # Input Dimension
max_num = 1000          # Must be >= than (sequence_len - 1)
output_dim = 1
hidden_dim = 16
batch_size = 1000

def generate_data(sample_size, seq_len=sequence_len, max = max_num):
    data = []
    labels = []
    for _ in range(sample_size):
        type = (1 if random.random() < 0.5 else 0)
        temp = []
        if type == 1:
            labels.append(1)
            temp.append(random.randint(0, max_num - seq_len + 1))
            for i in range(1, seq_len):
                temp.append(random.randint(temp[i - 1] + 1, max_num - seq_len + i + 1))
            data.append(temp)
        if type == 0:
            labels.append(0)
            temp.append(random.randint(0 + seq_len - 1, max_num))
            for i in range(1, seq_len):
                temp.append(random.randint( 0 + seq_len - i - 1, temp[i - 1] - 1))
            data.append(temp)
    return data, labels

input_data, labels = generate_data(100000)

x = tf.placeholder(tf.int32, [None, sequence_len])
y = tf.placeholder(tf.int32, [None, output_dim])

W = tf.Variable(tf.random_normal([sequence_len, output_dim]))
b = tf.Variable(tf.random_normal([output_dim]))

cell = rnn.BasicRNNCell(hidden_dim)
outputs, states = tf.nn.static_rnn(cell, x, dtype=tf.int32)

tf.static_rnn expects a list of Tensors as per documentation so it can determines the length of your RNN (note that this must be determined before runtime and this is why you need to pass a python list of Tensors and not a Tensor ):

inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.

outputs, states = tf.nn.static_rnn(cell, [x], dtype=tf.int32) should work.

Regarding your side question, part of the answer can be found in the implementation of BasicRNNCell :

def call(self, inputs, state):
    """Most basic RNN: output = new_state = act(W * input + U * state + B)."""
    output = self._activation(_linear([inputs, state], self._num_units, True))
    return output, output

but it really depends on the RNNCell you choose to use. This is the part of your model which will implements the input to state , the state to state and the state to output logics.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM