简体   繁体   中英

How to correctly shape time-series data for RNN?

I have started working on a simple project with TensorFlow in Python to predict stock market prices with a recurrent network. So far, this is my code:

n_steps = 30
n_inputs = 1
n_neurons = 100
n_outputs = 1

X = tf.placeholder(tf.float32, [1, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
    tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu),
    output_size = n_outputs
)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

learning_rate = 0.001

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()
n_iterations = numStocks
batch_size = 1

def priceArrayToRNNFormat(priceArray):
    list = []
    print(priceArray)
    for price in priceArray:
        list.append(price)
    return np.array(list)

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        dataOrig = [allStocksDict[list(allStocksDict.keys())[iteration]]]
        data = priceArrayToRNNFormat(dataOrig)
        print(data)
        X_batch = data
        y_batch = data
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

For reference, allStocksDict is simply a dictionary where each key is a stock symbol and the value is a 30 element array of its prices over time. When running the code, I get the following output:

[['14.9400', '15.0000', '14.8800', '14.6900', '14.6300', '15.0000', '14.9400', '15.1300', '15.5600', '15.3100', '15.3800', '14.6900', '15.0000', '15.1300', '14.6300', '14.0600', '14.1300', '14.9400', '14.4400', '13.6300', '13.0000', '12.3800', '12.5000', '12.6300', '13.0000', '12.6900', '13.1300', '13.1900', '13.0600', '12.9400']]
[['14.9400' '15.0000' '14.8800' '14.6900' '14.6300' '15.0000' '14.9400'
  '15.1300' '15.5600' '15.3100' '15.3800' '14.6900' '15.0000' '15.1300'
  '14.6300' '14.0600' '14.1300' '14.9400' '14.4400' '13.6300' '13.0000'
  '12.3800' '12.5000' '12.6300' '13.0000' '12.6900' '13.1300' '13.1900'
  '13.0600' '12.9400']]
Traceback (most recent call last):
  File "/home/john/Python/StockProject/monthlyRnn1.py", line 127, in <module>
    sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
  File "/home/john/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/john/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 30) for Tensor 'Placeholder:0', which has shape '(1, 30, 1)'

I've tried feeding the list on its own without converting it to an array and not turning the array into a vector before turning it into an array though this error persists. I'd greatly appreciate help with this.

One possible solution could be

def priceArrayToRNNFormat(priceArray):
    #list = []
    #print(priceArray)
    #for price in priceArray:
    #    list.append(price)
    #return np.array(list)
    return np.reshape(np.asarray(priceArray, dtype=np.float32), (1, n_steps, n_inputs))

Nested list is also acceptable, and another option is to transpose priceArray and wrap it up again into a list as mini-batch.
But the former option, np.reshape() is simple and fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM