This is the code used to convert data to TFRecord
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
with tf.python_io.TFRecordWriter("train.tfrecords") as writer:
for row in train_data:
prices, label, pip = row[0],row[1],row[2]
prices = np.asarray(prices).astype(np.float32)
example = tf.train.Example(features=tf.train.Features(feature={
'prices': _floats_feature(prices),
'label': _int64_feature(label[0]),
'pip': _floats_feature(pip)
}))
writer.write(example.SerializeToString())
Feature prices is an array of shape(1,288). It converted successfully! But when decoded the data using a parse function and Dataset API.
def parse_func(serialized_data):
keys_to_features = {'prices': tf.FixedLenFeature([], tf.float32),
'label': tf.FixedLenFeature([], tf.int64)}
parsed_features = tf.parse_single_example(serialized_data, keys_to_features)
return parsed_features['prices'],tf.one_hot(parsed_features['label'],2)
It gave me the error
C:\\tf_jenkins\\workspace\\rel-win\\M\\windows-gpu\\PY\\36\\tensorflow\\core\\framework\\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example. 2018-03-31 15:37:11.443073: WC:\\tf_jenkins\\workspace\\rel-win\\M\\windows-gpu\\PY\\36\\tensorflow\\core\\framework\\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example. 2018-03-31 15:37:11.443313: WC:\\tf_jenkins\\workspace\\rel-win\\M\\windows-gpu\\ raise type(e)(node_def, op, message) PY\\36\\tensortensorflow.python.framework.errors_impl.InvalidArgumentError: Key: prices. Can't parse serialized Example. [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_INT64, DT_FLOAT], dense_keys=["label", "prices"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1)]] [[Node: IteratorGetNext_1 = IteratorGetNextoutput_shapes=[[?], [?,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]fl ow\\core\\framework\\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example.
I found the problem. Instead of using tf.io.FixedLenFeature
for parsing an array, use tf.io.FixedLenSequenceFeature
(for TensorFlow 1, use tf.
instead of tf.io.
)
If your feature is a fixed 1-d array then using tf.FixedLenSequenceFeature is not correct at all. As the documentation mentioned, the tf.FixedLenSequenceFeature is for a input data with dimension 2 and higher. In this example you need to flatten your price array to become (288,) and then for decoding part you need to mention the array dimension.
Encode:
example = tf.train.Example(features=tf.train.Features(feature={
'prices': _floats_feature(prices.tolist()),
'label': _int64_feature(label[0]),
'pip': _floats_feature(pip)
Decode:
keys_to_features = {'prices': tf.FixedLenFeature([288], tf.float32),
'label': tf.FixedLenFeature([], tf.int64)}
You can't store an n-dimensional array as a float feature as float features are simple lists. You have to flatten prices
into a list by doing prices.tolist()
. If you need to recover the n-dimensional array from the flattened float feature, then you can do prices = np.reshape(float_feature, original_shape)
.
I had the same issue while carelessly modifying some scripts, it was caused by slightly different data shape. I had to change the shape to match expected shape, eg (A, B)
to (1, A, B)
. I used np.ravel()
for flattening.
Exactly the same thing happens to me with reading float32
data lists from TFrecord
files.
I get Can't parse serialized Example when executing sess.run([time_tensor, frequency_tensor, frequency_weight_tensor])
with tf.FixedLenFeature
, though tf.FixedLenSequenceFeature
seems to be working fine.
My feature format for reading files (the working one) is as follows: feature_format = { 'time': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 'frequencies': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 'frequency_weights': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True) }
The encoding part is:
feature = { 'time': tf.train.Feature(float_list=tf.train.FloatList(value=[*some single value*]) ), 'frequencies': tf.train.Feature(float_list=tf.train.FloatList(value=*some_list*) ), 'frequency_weights': tf.train.Feature(float_list=tf.train.FloatList(value=*some_list*) ) }
This happens with TensorFlow 1.12 on Debian machine without GPU offloading (ie only CPU used with TensorFlow)
Is there any misuse from my side? Or is it a bug in the code or documentation? I can think on contributing/upstreaming any fixes if that would benefit anyone...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.