简体   繁体   English

Tensorflow-批处理问题

[英]Tensorflow - batching issues

I'm quite new to tensorflow, and I'm trying to train from my csv files using batch. 我对tensorflow很陌生,并且我正在尝试使用批处理从csv文件进行训练。

Here's my code for read csv file and make batch 这是我的代码,用于读取csv文件并进行批处理

filename_queue = tf.train.string_input_producer(
    ['BCHARTS-BITSTAMPUSD.csv'], shuffle=False, name='filename_queue')

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[0.], [0.], [0.], [0.], [0.],[0.],[0.],[0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)

# collect batches of csv in
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=100)

and here's for training : 这是培训:

# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)


# train my model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(2193 / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = sess.run([train_x_batch, train_y_batch])
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

coord.request_stop()
coord.join(threads)

Here's my questions : 这是我的问题:

1. 1。

My csv file have 2193 records and my batching size is 100. So what I want is this : in every 'epoch' start with 'first record', and trains 21 batches with 100 records, and last 1 batch with 93 records. 我的csv文件有2193条记录,我的批处理大小为100。所以我想要的是:在每个“纪元”中,从“第一个记录”开始,训练21个具有100条记录的批处理,最后一个具有93个记录的批处理。 so total 22 batches. 总共22批次

However, I found that every batch has 100 size - even with the last one. 但是,我发现每个批次都有100个尺寸-即使最后一个也是如此。 Moreover, it does not start with 'first record' from second 'epoch'. 而且,它不是从第二“时代”的“第一记录”开始的。

2. 2。

How can I obtain records size(in this case, 2193)? 如何获取记录大小(在这种情况下为2193)? Should I hard code it? 我应该硬编码吗? Or is there other smart way to do it? 还是有其他聪明的方法吗? I used tendor.get_shape().as_list() but it's not work for batch_xs. 我使用了tendor.get_shape()。as_list(),但不适用于batch_xs。 It just returns me empty shape []. 它只是返回我空的形状[]。

We recently added a new API to TensorFlow called tf.contrib.data that makes it easier to solve problems like this. 我们最近在TensorFlow中添加了一个名为tf.contrib.data的新API,该API可以更轻松地解决此类问题。 (The "queue runner"–based APIs make it difficult to write computations on exact epoch boundaries, because the epoch boundary gets lost.) (基于“队列运行器”的API使得难以在确切的历元边界上编写计算,因为历元边界丢失了。)

Here's an example of how you'd use tf.contrib.data to rewrite your program: 这是一个如何使用tf.contrib.data重写程序的示例:

lines = tf.contrib.data.TextLineDataset("BCHARTS-BITSTAMPUSD.csv")

def decode(line):
  record_defaults = [[0.], [0.], [0.], [0.], [0.],[0.],[0.],[0.]]
  xy = tf.decode_csv(value, record_defaults=record_defaults)
  return xy[0:-1], xy[-1:]

decoded = lines.map(decode)

batched = decoded.batch(100)

iterator = batched.make_initializable_iterator()

train_x_batch, train_y_batch = iterator.get_next()

Then the training part can become a bit simpler: 然后,训练部分可以变得更简单:

# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# train my model
for epoch in range(training_epochs):
  avg_cost = 0
  total_batch = int(2193 / batch_size)

  total_cost = 0.0
  total_batch = 0

  # Re-initialize the iterator for another epoch.
  sess.run(iterator.initializer)

  while True:

    # NOTE: It is inefficient to make a separate sess.run() call to get each batch 
    # of input data and then feed it into a different sess.run() call. For better
    # performance, define your training graph to take train_x_batch and train_y_batch
    # directly as inputs.
    try:
      batch_xs, batch_ys = sess.run([train_x_batch, train_y_batch])
    except tf.errors.OutOfRangeError:
      break

    feed_dict = {X: batch_xs, Y: batch_ys}
    c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
    total_cost += c
    total_batch += batch_xs.shape[0]

  avg_cost = total_cost / total_batch

  print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

For more details about how to use the new API, see the "Importing Data" programmer's guide . 有关如何使用新API的更多详细信息,请参见“导入数据”程序员指南

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM