Tensorflow自動編碼器代碼說明和自定義測試數據

Question

我想問一個關於我對Tensorflow輸入隊列不完全了解的問題。 我已經創建了一個Tensorflow模塊，該模塊使用中的代碼按如下方式創建數據批。

這段代碼：

# various initialization variables
BATCH_SIZE = 128
N_FEATURES = 9

def batch_generator(filenames, record_bytes):
  """ filenames is the list of files you want to read from. 
  In this case, it contains only heart.csv
  """

  record_bytes = 29**2 # 29x29 images per record
  filename_queue = tf.train.string_input_producer(filenames)
  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) # skip   the first line in the file
  _, value = reader.read(filename_queue)
  print(value)



  # read in the 10 columns of data
  content = tf.decode_raw(value, out_type=tf.uint8) 

  # The bytes read  represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(
    tf.strided_slice(content, [0],
                   [record_bytes]),
    [1, 29, 29])

  # Convert from [depth, height, width] to [height, width, depth].
  uint8image = tf.transpose(depth_major, [1, 2, 0])
  uint8image = tf.reshape(uint8image, [29**2])  # reshape it a single- dimensional vector
  uint8image = tf.cast(uint8image, tf.float32)
  uint8image = tf.nn.l2_normalize(uint8image,dim=0) # normalize along   vertical dimension

  # minimum number elements in the queue after a dequeue, used to ensure 
  # that the samples are sufficiently mixed
  # I think 10 times the BATCH_SIZE is sufficient
  min_after_dequeue = 10 * BATCH_SIZE

  # the maximum number of elements in the queue
  capacity = 20 * BATCH_SIZE

  # shuffle the data to generate BATCH_SIZE sample pairs
  data_batch = tf.train.shuffle_batch([uint8image],   batch_size=BATCH_SIZE, 
                                    capacity=capacity,   min_after_dequeue=min_after_dequeue)

  return data_batch

我的問題是，每次調用此函數時，我是否都能准確獲得128條記錄？ 對於等

 batch_xs = sess.run(data_batch)

1）在這種情況下，batch_xs的值是多少？

2）我使用的示例利用以下代碼來評估培訓的效率：

encode_decode = sess.run(
  y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})

我將如何處理自己存儲在另一個二進制文件中的測試數據？ 這個問題與我以前在Tensorflow Autoencoder上發表的帖子有關，帶有來自二進制文件的自定義訓練示例。

Answer 1

為了解決上述問題，我使用了我創建的data_reader模塊，如下所示：

import tensorflow as tf

# various initialization variables
BATCH_SIZE = 128
N_FEATURES = 9

def batch_generator(filenames, record_bytes):
  """ filenames is the list of files you want to read from. 
  In this case, it contains only heart.csv
  """

  record_bytes = 29**2 # 29x29 images per record
  filename_queue = tf.train.string_input_producer(filenames)
  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) # skip  the first line in the file
  _, value = reader.read(filename_queue)
  print(value)

  # record_defaults are the default values in case some of our columns are empty
  # This is also to tell tensorflow the format of our data (the type of the decode result)
  # for this dataset, out of 9 feature columns, 
  # 8 of them are floats (some are integers, but to make our features homogenous, 
  # we consider them floats), and 1 is string (at position 5)
  # the last column corresponds to the lable is an integer

  #record_defaults = [[1.0] for _ in range(N_FEATURES)]
  #record_defaults[4] = ['']
  #record_defaults.append([1])

  # read in the 10 columns of data
  content = tf.decode_raw(value, out_type=tf.uint8) 
  #print(content)

  # convert the 5th column (present/absent) to the binary value 0 and 1
  #condition = tf.equal(content[4], tf.constant('Present'))
  #content[4] = tf.where(condition, tf.constant(1.0), tf.constant(0.0))

  # pack all UINT8 values into a tensor
  features = tf.stack(content)
  #print(features)

  # assign the last column to label
  #label = content[-1]

  # The bytes read  represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(
  tf.strided_slice(content, [0],
                   [record_bytes]),
    [1, 29, 29])

  # Convert from [depth, height, width] to [height, width, depth].
  uint8image = tf.transpose(depth_major, [1, 2, 0])
  uint8image = tf.reshape(uint8image, [29**2])  # reshape it a single-dimensional vector
  uint8image = tf.cast(uint8image, tf.float32)
  uint8image = tf.nn.l2_normalize(uint8image,dim=0) # normalize along   vertical dimension

  # minimum number elements in the queue after a dequeue, used to ensure 
  # that the samples are sufficiently mixed
  # I think 10 times the BATCH_SIZE is sufficient
  min_after_dequeue = 10 * BATCH_SIZE

  # the maximum number of elements in the queue
  capacity = 20 * BATCH_SIZE

  # shuffle the data to generate BATCH_SIZE sample pairs
  data_batch = tf.train.shuffle_batch([uint8image],    batch_size=BATCH_SIZE, 
                                    capacity=capacity,   min_after_dequeue=min_after_dequeue)

  return data_batch

然后，我創建了一個新的data_batch_eval，如下所示：

data_batch_eval = data_reader.batch_generator([DATA_PATH_EVAL],29**2)   #

評估集

這是測試代碼：

encode_decode = sess.run(
  y_pred, feed_dict={X: batch_ys[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
    #a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
    a[0][i].imshow(np.reshape(batch_ys[i], (29, 29)), cmap='gray')
    a[1][i].imshow(np.reshape(encode_decode[i],  (29, 29)), cmap='gray')
f.show()
plt.draw()
plt.waitforbuttonpress()

我的問題是，現在我相信encode_decode圖像都指向同一圖像。 如上所示，這可能與Autoencoder培訓代碼中的錯誤有關嗎？

Tensorflow自動編碼器代碼說明和自定義測試數據

問題描述

1 個解決方案

解決方案1
0 2017-07-19 19:41:40

Tensorflow自動編碼器代碼說明和自定義測試數據

問題描述

1 個解決方案

解決方案1 0 2017-07-19 19:41:40

解決方案1
0 2017-07-19 19:41:40