[英]Construct TfRecord for image-text paired data
I am stuck on making tfrecords work for image-text pair data. 我一直坚持让tfrecords用于图像-文本对数据。
Here is the code to create tfrecord from numpy array of image features and a text file, 这是从numpy图片特征数组和文本文件创建tfrecord的代码,
def npy_to_tfrecords(numpy_array, text_file, output_file):
f = open(text_file)
# write records to a tfrecords file
writer = tf.python_io.TFRecordWriter(output_file)
# Loop through all the features you want to write
for X, line in zip(numpy_array, f) :
#let say X is of np.array([[...][...]])
#let say y is of np.array[[0/1]]
txt = "{}".format(line[:-1])
txt = txt.encode()
# Feature contains a map of string to feature proto objects
feature = {}
feature['x'] = tf.train.Feature(float_list=tf.train.FloatList(value=X.flatten()))
feature['y'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[txt]))
# Construct the Example proto object
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize the example to a string
serialized = example.SerializeToString()
# write the serialized objec to the disk
writer.write(serialized)
writer.close()
I cannot make the dataset after this: 在此之后,我无法创建数据集:
def load_data_tfr():
train = tf.data.TFRecordDataset("train.tfrecord")
# example proto decode
def _parse_function1(example_proto):
keys_to_features = {'x': tf.FixedLenFeature(2048, tf.float32),
'y': tf.VarLenFeature(tf.string) }
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return {"x": parsed_features['x'], "y": parsed_features['y']} # ['x'], parsed_features['y']
# Parse the record into tensors.
train = train.map(_parse_function1)
return train
I keep . 我一直 。 getting the error:
得到错误:
train_data = load_data_tfr()
random.shuffle(train_data)
for i in reversed(range(1, len(x))): TypeError: object of type 'MapDataset' has no len()
Any help? 有什么帮助吗? thank you.
谢谢。
MapDataset has no length. MapDataset没有长度。
So, put these two lines on the very top of your code. 因此,将这两行放在代码的最上方。
import tensorflow as tf
tf.enable_eager_execution()
And then try 然后尝试
iterator = train_data.make_one_shot_iterator()
image, label = iterator.get_next()
Of course, I am assuming that your tfrecord section does not any errors. 当然,我假设您的tfrecord部分没有任何错误。
According to Tensorflow tutorial, images are saved in bytes format, rather than np arrays. 根据Tensorflow教程,图像以字节格式而不是np数组保存。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.