简体   繁体   中英

How to convert Float array/list to TFRecord?

This is the code used to convert data to TFRecord

def _int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

 def _bytes_feature(value):
   return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _floats_feature(value):
   return tf.train.Feature(float_list=tf.train.FloatList(value=value))

with tf.python_io.TFRecordWriter("train.tfrecords") as writer:
    for row in train_data:
        prices, label, pip = row[0],row[1],row[2]
        prices = np.asarray(prices).astype(np.float32)
        example = tf.train.Example(features=tf.train.Features(feature={
                                           'prices': _floats_feature(prices),
                                           'label': _int64_feature(label[0]),
                                           'pip': _floats_feature(pip)
    }))
        writer.write(example.SerializeToString())

Feature prices is an array of shape(1,288). It converted successfully! But when decoded the data using a parse function and Dataset API.

def parse_func(serialized_data):
    keys_to_features = {'prices': tf.FixedLenFeature([], tf.float32),
                    'label': tf.FixedLenFeature([], tf.int64)}

    parsed_features = tf.parse_single_example(serialized_data, keys_to_features)
    return parsed_features['prices'],tf.one_hot(parsed_features['label'],2)

It gave me the error

C:\\tf_jenkins\\workspace\\rel-win\\M\\windows-gpu\\PY\\36\\tensorflow\\core\\framework\\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example. 2018-03-31 15:37:11.443073: WC:\\tf_jenkins\\workspace\\rel-win\\M\\windows-gpu\\PY\\36\\tensorflow\\core\\framework\\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example. 2018-03-31 15:37:11.443313: WC:\\tf_jenkins\\workspace\\rel-win\\M\\windows-gpu\\ raise type(e)(node_def, op, message) PY\\36\\tensortensorflow.python.framework.errors_impl.InvalidArgumentError: Key: prices. Can't parse serialized Example. [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_INT64, DT_FLOAT], dense_keys=["label", "prices"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1)]] [[Node: IteratorGetNext_1 = IteratorGetNextoutput_shapes=[[?], [?,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]fl ow\\core\\framework\\op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: prices. Can't parse serialized Example.

I found the problem. Instead of using tf.io.FixedLenFeature for parsing an array, use tf.io.FixedLenSequenceFeature
(for TensorFlow 1, use tf. instead of tf.io. )

If your feature is a fixed 1-d array then using tf.FixedLenSequenceFeature is not correct at all. As the documentation mentioned, the tf.FixedLenSequenceFeature is for a input data with dimension 2 and higher. In this example you need to flatten your price array to become (288,) and then for decoding part you need to mention the array dimension.

Encode:

example = tf.train.Example(features=tf.train.Features(feature={
                                       'prices': _floats_feature(prices.tolist()),
                                       'label': _int64_feature(label[0]),
                                       'pip': _floats_feature(pip)

Decode:

keys_to_features = {'prices': tf.FixedLenFeature([288], tf.float32),
                'label': tf.FixedLenFeature([], tf.int64)}

You can't store an n-dimensional array as a float feature as float features are simple lists. You have to flatten prices into a list by doing prices.tolist() . If you need to recover the n-dimensional array from the flattened float feature, then you can do prices = np.reshape(float_feature, original_shape) .

I had the same issue while carelessly modifying some scripts, it was caused by slightly different data shape. I had to change the shape to match expected shape, eg (A, B) to (1, A, B) . I used np.ravel() for flattening.

Exactly the same thing happens to me with reading float32 data lists from TFrecord files.

I get Can't parse serialized Example when executing sess.run([time_tensor, frequency_tensor, frequency_weight_tensor]) with tf.FixedLenFeature , though tf.FixedLenSequenceFeature seems to be working fine.

My feature format for reading files (the working one) is as follows: feature_format = { 'time': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 'frequencies': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 'frequency_weights': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True) }

The encoding part is:

feature = { 'time': tf.train.Feature(float_list=tf.train.FloatList(value=[*some single value*]) ), 'frequencies': tf.train.Feature(float_list=tf.train.FloatList(value=*some_list*) ), 'frequency_weights': tf.train.Feature(float_list=tf.train.FloatList(value=*some_list*) ) }

This happens with TensorFlow 1.12 on Debian machine without GPU offloading (ie only CPU used with TensorFlow)

Is there any misuse from my side? Or is it a bug in the code or documentation? I can think on contributing/upstreaming any fixes if that would benefit anyone...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM