Tensorflow：序列对TFRecord的编码和解码是否重要

Question

I have some practice data that I want to encode to a TFRecord format and then decode to tf.features in Tensorflow. 我有一些练习数据，我想将其编码为TFRecord格式，然后在tf.features中解码为tf.features。 My question is very basic, but I could not find a clear answer to this. 我的问题很基本，但是我找不到明确的答案。

Question : Do I need to decode the features in a dataset in the same sequence as they are encoded? 问题：是否需要按照与编码相同的顺序对数据集中的特征进行解码？ In other words, I can't seem to find a way to reference features by field name in a TFRecord. 换句话说，我似乎找不到在TFRecord中通过字段名称引用要素的方法。 This is really important for 2 reasons. 这很重要，原因有两个。

I just wanted to get my assumption validated, so that I know how to avoid breaking my code in the future. 我只是想验证我的假设，以便我知道如何避免将来破坏我的代码。 Here is some simple code, though this is not a complete example. 这是一些简单的代码，尽管这不是完整的示例。
Python makes a big deal about dictionaries being un-ordered . Python对于字典是无序的有很多意义 。 So how can I guarantee sequence when I am using a data structure that is supposed to be unordered? 那么，当我使用应该是无序的数据结构时，如何保证顺序呢？ I was not sure if this was handled in some way that I don't know about. 我不确定这是否以我不知道的某种方式处理。

To encode data into TFRecord format, you can do something like: 要将数据编码为TFRecord格式，可以执行以下操作：

#Fields in Dataframe: ['DIVISION','SPORDER','PUMA','REGION']

df = pd.DataFrame(...)
with tf.python_io.TFRecordWriter('myfile.tfrecord') as writer:

    for row in df.itertuples():
        example = tf.train.Example(features=tf.train.Features(feature={
          'feat/division': tf.train.Feature(int64_list=tf.train.Int64List(value=row.DIVISION)),
          'label/sporder': tf.train.Feature(int64_list=tf.train.Int64List(value=row.SPORDER)),
          'feat/puma': tf.train.Feature(bytes_list=tf.train.BytesList(value=[row.PUMA])),
          'feat/region': tf.train.Feature(bytes_list=tf.train.BytesList(value=[row.REGION]))))
        writer.write(example.SerializeToString())

Then to ingest the dataset you would need something like the code below. 然后，要摄取数据集，您将需要类似下面的代码。 Notice that the fields are referenced again in order. 请注意，再次按顺序引用了这些字段。 NOTE: I used the same dictionary keys in the TFRecords versus the decoded form, but I don't think that is necessary--just a convenience. 注意：我在TFRecords和解码格式中使用了相同的字典键，但是我认为这不是必需的-只是为了方便。 I was not sure if that is the way things have to be? 我不确定这是否是必须的方式吗？ Meaning, 含义，

dataset = tf.data.TFRecordDataset('myfile.tfrecord')
dataset = dataset.map(_parse_function)

def _parse_function(example_proto):
    features = {'feat/division': tf.FixedLenFeature((), tf.string, default_value=""),
                'label/sporder': tf.FixedLenFeature((), tf.int64, default_value=0),
                'feat/puma': tf.VarLenFeature(dtype=tf.string),
                'feat/region': tf.VarLenFeature(dtype=tf.string)}

    parsed_example = tf.parse_single_example(example_proto, features)
    parsed_label = parsed_example.pop("label/sporder", None)


    return parsed_example, parsed_label

Answer 1

The tfrecord format uses protobuf for serialization of the struct. tfrecord格式使用protobuf对结构进行序列化。 You can think about it as a binary json/xml format. 您可以将其视为二进制json / xml格式。 Json/xml and protobuf don't care about the order of the fields. Json / xml和protobuf并不关心字段的顺序。 So, the order of the feature definitions is not important. 因此，特征定义的顺序并不重要。 It's the same in your snippet because it was just convenient for reading. 您的摘要中的内容相同，因为它很方便阅读。

Tensorflow：序列对TFRecord的编码和解码是否重要

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-11-01 15:53:11

Tensorflow：序列对TFRecord的编码和解码是否重要

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-11-01 15:53:11

解决方案1
1 已采纳 2018-11-01 15:53:11