[英]Tensorflow Autoencoder with custom training examples from binary file
[英]Tensorflow autoencoder code clarification and custom test data
我想問一個關於我對Tensorflow輸入隊列不完全了解的問題。 我已經創建了一個Tensorflow模塊,該模塊使用中的代碼按如下方式創建數據批。
這段代碼:
# various initialization variables
BATCH_SIZE = 128
N_FEATURES = 9
def batch_generator(filenames, record_bytes):
""" filenames is the list of files you want to read from.
In this case, it contains only heart.csv
"""
record_bytes = 29**2 # 29x29 images per record
filename_queue = tf.train.string_input_producer(filenames)
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) # skip the first line in the file
_, value = reader.read(filename_queue)
print(value)
# read in the 10 columns of data
content = tf.decode_raw(value, out_type=tf.uint8)
# The bytes read represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(
tf.strided_slice(content, [0],
[record_bytes]),
[1, 29, 29])
# Convert from [depth, height, width] to [height, width, depth].
uint8image = tf.transpose(depth_major, [1, 2, 0])
uint8image = tf.reshape(uint8image, [29**2]) # reshape it a single- dimensional vector
uint8image = tf.cast(uint8image, tf.float32)
uint8image = tf.nn.l2_normalize(uint8image,dim=0) # normalize along vertical dimension
# minimum number elements in the queue after a dequeue, used to ensure
# that the samples are sufficiently mixed
# I think 10 times the BATCH_SIZE is sufficient
min_after_dequeue = 10 * BATCH_SIZE
# the maximum number of elements in the queue
capacity = 20 * BATCH_SIZE
# shuffle the data to generate BATCH_SIZE sample pairs
data_batch = tf.train.shuffle_batch([uint8image], batch_size=BATCH_SIZE,
capacity=capacity, min_after_dequeue=min_after_dequeue)
return data_batch
我的問題是,每次調用此函數時,我是否都能准確獲得128條記錄? 對於等
batch_xs = sess.run(data_batch)
1)在這種情況下,batch_xs的值是多少?
2)我使用的示例利用以下代碼來評估培訓的效率:
encode_decode = sess.run(
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
我將如何處理自己存儲在另一個二進制文件中的測試數據? 這個問題與我以前在Tensorflow Autoencoder上發表的帖子有關, 帶有來自二進制文件的自定義訓練示例 。
為了解決上述問題,我使用了我創建的data_reader模塊,如下所示:
import tensorflow as tf
# various initialization variables
BATCH_SIZE = 128
N_FEATURES = 9
def batch_generator(filenames, record_bytes):
""" filenames is the list of files you want to read from.
In this case, it contains only heart.csv
"""
record_bytes = 29**2 # 29x29 images per record
filename_queue = tf.train.string_input_producer(filenames)
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) # skip the first line in the file
_, value = reader.read(filename_queue)
print(value)
# record_defaults are the default values in case some of our columns are empty
# This is also to tell tensorflow the format of our data (the type of the decode result)
# for this dataset, out of 9 feature columns,
# 8 of them are floats (some are integers, but to make our features homogenous,
# we consider them floats), and 1 is string (at position 5)
# the last column corresponds to the lable is an integer
#record_defaults = [[1.0] for _ in range(N_FEATURES)]
#record_defaults[4] = ['']
#record_defaults.append([1])
# read in the 10 columns of data
content = tf.decode_raw(value, out_type=tf.uint8)
#print(content)
# convert the 5th column (present/absent) to the binary value 0 and 1
#condition = tf.equal(content[4], tf.constant('Present'))
#content[4] = tf.where(condition, tf.constant(1.0), tf.constant(0.0))
# pack all UINT8 values into a tensor
features = tf.stack(content)
#print(features)
# assign the last column to label
#label = content[-1]
# The bytes read represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(
tf.strided_slice(content, [0],
[record_bytes]),
[1, 29, 29])
# Convert from [depth, height, width] to [height, width, depth].
uint8image = tf.transpose(depth_major, [1, 2, 0])
uint8image = tf.reshape(uint8image, [29**2]) # reshape it a single-dimensional vector
uint8image = tf.cast(uint8image, tf.float32)
uint8image = tf.nn.l2_normalize(uint8image,dim=0) # normalize along vertical dimension
# minimum number elements in the queue after a dequeue, used to ensure
# that the samples are sufficiently mixed
# I think 10 times the BATCH_SIZE is sufficient
min_after_dequeue = 10 * BATCH_SIZE
# the maximum number of elements in the queue
capacity = 20 * BATCH_SIZE
# shuffle the data to generate BATCH_SIZE sample pairs
data_batch = tf.train.shuffle_batch([uint8image], batch_size=BATCH_SIZE,
capacity=capacity, min_after_dequeue=min_after_dequeue)
return data_batch
然后,我創建了一個新的data_batch_eval,如下所示:
data_batch_eval = data_reader.batch_generator([DATA_PATH_EVAL],29**2) #
評估集
這是測試代碼:
encode_decode = sess.run(
y_pred, feed_dict={X: batch_ys[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
#a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[0][i].imshow(np.reshape(batch_ys[i], (29, 29)), cmap='gray')
a[1][i].imshow(np.reshape(encode_decode[i], (29, 29)), cmap='gray')
f.show()
plt.draw()
plt.waitforbuttonpress()
我的問題是,現在我相信encode_decode圖像都指向同一圖像。 如上所示,這可能與Autoencoder培訓代碼中的錯誤有關嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.