簡體   English   中英

理解為什么 tensorflow RNN 不學習玩具數據

[英]Understanding why tensorflow RNN is not learning toy data

我正在嘗試在玩具分類問題上使用 Tensorflow(r0.10,python 3.5)訓練遞歸神經網絡,但結果令人困惑。

我想將一系列 0 和 1 輸入 RNN,並讓序列中給定元素的目標類成為由序列的當前值和先前值表示的數字,將其視為二進制數。 例如:

input sequence: [0,     0,     1,     0,     1,     1]
binary digits : [-, [0,0], [0,1], [1,0], [0,1], [1,1]]
target class  : [-,     0,     1,     2,     1,     3]

看起來這是 RNN 應該能夠很容易地學習的東西,但是我的模型只能區分 [0,2] 和 [1,3] 類。 換句話說,它能夠區分當前數字為 0 的類別和當前數字為 1 的類別。這讓我相信 RNN 模型沒有正確地學習查看序列的先前值.

有幾個教程和示例([ 1 ]、[ 2 ]、[ 3 ])演示了如何在 tensorflow 中構建和使用遞歸神經網絡(RNN),但是在研究它們之后我仍然沒有看到我的問題(它沒有幫助所有示例都使用文本作為源數據)。

我將我的數據作為長度T的列表輸入到tf.nn.rnn() ,其元素是[batch_size x input_size]序列。 由於我的序列是一維的, input_size等於一,所以基本上我相信我正在輸入一個長度為batch_size的序列列表( 文檔不清楚我不清楚哪個維度被視為時間維度)。 這種理解正確嗎? 如果是這樣的話,那么我不明白為什么 RNN 模型沒有正確學習。

很難得到一小組代碼可以通過我的完整 RNN 運行,這是我能做的最好的(它主要改編自這里的 PTB 模型和這里的 char-rnn 模型):

import tensorflow as tf
import numpy as np

input_size = 1
batch_size = 50
T = 2
lstm_size = 5
lstm_layers = 2
num_classes = 4
learning_rate = 0.1

lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True)
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True)

x = tf.placeholder(tf.float32, [T, batch_size, input_size])
y = tf.placeholder(tf.int32, [T * batch_size * input_size])

init_state = lstm.zero_state(batch_size, tf.float32)

inputs = [tf.squeeze(input_, [0]) for input_ in tf.split(0,T,x)]
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state)

w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w')
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b')

output = tf.concat(0, outputs)

logits = tf.matmul(output, w) + b

probs = tf.nn.softmax(logits)

cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
    [logits], [y], [tf.ones_like(y, dtype=tf.float32)]
))

optimizer = tf.train.GradientDescentOptimizer(learning_rate)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
                                  10.0)
train_op = optimizer.apply_gradients(zip(grads, tvars))

init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    curr_state = sess.run(init_state)
    for i in range(3000):
        # Create toy data where the true class is the value represented
        # by the current and previous value treated as binary, i.e.
        train_x = np.random.randint(0,2,(T * batch_size * input_size))
        train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2)))

        # Reshape into T x batch_size x input_size
        train_x = np.reshape(train_x, (T, batch_size, input_size))

        feed_dict = {
            x: train_x, y: train_y
        }
        for j, (c, h) in enumerate(init_state):
            feed_dict[c] = curr_state[j].c
            feed_dict[h] = curr_state[j].h

        fetch_dict = {
            'cost': cost, 'final_state': final_state, 'train_op': train_op
        }

        # Evaluate the graph
        fetches = sess.run(fetch_dict, feed_dict=feed_dict)

        curr_state = fetches['final_state']

        if i % 300 == 0:
            print('step {}, train cost: {}'.format(i, fetches['cost']))

    # Test
    test_x = np.array([[0],[0],[1],[0],[1],[1]]*(T*batch_size*input_size))
    test_x = test_x[:(T*batch_size*input_size),:]
    probs_out = sess.run(probs, feed_dict={
            x: np.reshape(test_x, [T, batch_size, input_size]),
            init_state: curr_state
        })
    # Get the softmax outputs for the points in the sequence
    # that have [0, 0], [0, 1], [1, 0], [1, 1] as their
    # last two values.
    for i in [1, 2, 3, 5]:
        print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
                [1, 2, 3, 5].index(i), *list(probs_out[i,:]))
             )

這里的最終輸出是

0: [0.4899 0.0007 0.5080 0.0014]
1: [0.0003 0.5155 0.0009 0.4833]
2: [0.5078 0.0011 0.4889 0.0021]
3: [0.0003 0.5052 0.0009 0.4936]

這表明它只是在學習區分 [0,2] 和 [1,3]。 為什么這個模型不學習使用序列中的前一個值?

這篇博文的幫助下想通了(它有輸入張量的精彩圖表)。 事實證明,我沒有正確理解tf.nn.rnn()輸入的形狀:

假設您有batch_size個序列。 每個序列都有input_size維度和長度T (選擇這些名稱是為了匹配tf.nn.rnn() 文檔)。 然后,您需要將輸入拆分為T長度列表,其中每個元素的形狀為batch_size x input_size 這意味着您的連續序列將分布在列表的元素中 我認為連續的序列將保持在一起,以便列表inputs的每個元素都是一個序列的示例。

回想起來,這是有道理的,因為我們希望通過序列並行化每個步驟,所以我們希望運行每個序列的第一步(列表中的第一個元素),然后是每個序列的第二步(列表中的第二個元素),等等.

代碼的工作版本:

import tensorflow as tf
import numpy as np

sequence_size = 50
batch_size = 7
num_features = 1
lstm_size = 5
lstm_layers = 2
num_classes = 4
learning_rate = 0.1

lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True)
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True)

x = tf.placeholder(tf.float32, [batch_size, sequence_size, num_features])
y = tf.placeholder(tf.int32, [batch_size * sequence_size * num_features])

init_state = lstm.zero_state(batch_size, tf.float32)

inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(1,sequence_size,x)]
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state)

w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w')
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b')

output = tf.reshape(tf.concat(1, outputs), [-1, lstm_size])

logits = tf.matmul(output, w) + b

probs = tf.nn.softmax(logits)

cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
    [logits], [y], [tf.ones_like(y, dtype=tf.float32)]
))

# Now optimize on that cost
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
                                  10.0)
train_op = optimizer.apply_gradients(zip(grads, tvars))

init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    curr_state = sess.run(init_state)
    for i in range(3000):
        # Create toy data where the true class is the value represented
        # by the current and previous value treated as binary, i.e.
        
        train_x = np.random.randint(0,2,(batch_size * sequence_size * num_features))
        train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2)))
        
        # Reshape into T x batch_size x sequence_size
        train_x = np.reshape(train_x, [batch_size, sequence_size, num_features])
        
        feed_dict = {
            x: train_x, y: train_y
        }
        for j, (c, h) in enumerate(init_state):
            feed_dict[c] = curr_state[j].c
            feed_dict[h] = curr_state[j].h
        
        fetch_dict = {
            'cost': cost, 'final_state': final_state, 'train_op': train_op
        }
        
        # Evaluate the graph
        fetches = sess.run(fetch_dict, feed_dict=feed_dict)
        
        curr_state = fetches['final_state']
        
        if i % 300 == 0:
            print('step {}, train cost: {}'.format(i, fetches['cost']))
    
    # Test
    test_x = np.array([[0],[0],[1],[0],[1],[1]]*(batch_size * sequence_size * num_features))
    test_x = test_x[:(batch_size * sequence_size * num_features),:]
    probs_out = sess.run(probs, feed_dict={
            x: np.reshape(test_x, [batch_size, sequence_size, num_features]),
            init_state: curr_state
        })
    # Get the softmax outputs for the points in the sequence
    # that have [0, 0], [0, 1], [1, 0], [1, 1] as their
    # last two values.
    for i in [1, 2, 3, 5]:
        print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
                [1, 2, 3, 5].index(i), *list(probs_out[i,:]))
             )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM