[英]Tensorflow: Does sequence matter in encoding and decoding a TFRecord
I have some practice data that I want to encode to a TFRecord
format and then decode to tf.features
in Tensorflow. 我有一些练习数据,我想将其编码为
TFRecord
格式,然后在tf.features
中解码为tf.features。 My question is very basic, but I could not find a clear answer to this. 我的问题很基本,但是我找不到明确的答案。
Question : Do I need to decode the features in a dataset in the same sequence as they are encoded? 问题 :是否需要按照与编码相同的顺序对数据集中的特征进行解码? In other words, I can't seem to find a way to reference features by field name in a TFRecord.
换句话说,我似乎找不到在TFRecord中通过字段名称引用要素的方法。 This is really important for 2 reasons.
这很重要,原因有两个。
To encode data into TFRecord format, you can do something like: 要将数据编码为TFRecord格式,可以执行以下操作:
#Fields in Dataframe: ['DIVISION','SPORDER','PUMA','REGION']
df = pd.DataFrame(...)
with tf.python_io.TFRecordWriter('myfile.tfrecord') as writer:
for row in df.itertuples():
example = tf.train.Example(features=tf.train.Features(feature={
'feat/division': tf.train.Feature(int64_list=tf.train.Int64List(value=row.DIVISION)),
'label/sporder': tf.train.Feature(int64_list=tf.train.Int64List(value=row.SPORDER)),
'feat/puma': tf.train.Feature(bytes_list=tf.train.BytesList(value=[row.PUMA])),
'feat/region': tf.train.Feature(bytes_list=tf.train.BytesList(value=[row.REGION]))))
writer.write(example.SerializeToString())
Then to ingest the dataset you would need something like the code below. 然后,要摄取数据集,您将需要类似下面的代码。 Notice that the fields are referenced again in order.
请注意,再次按顺序引用了这些字段。 NOTE: I used the same dictionary keys in the TFRecords versus the decoded form, but I don't think that is necessary--just a convenience.
注意:我在TFRecords和解码格式中使用了相同的字典键,但是我认为这不是必需的-只是为了方便。 I was not sure if that is the way things have to be?
我不确定这是否是必须的方式吗? Meaning,
含义,
dataset = tf.data.TFRecordDataset('myfile.tfrecord')
dataset = dataset.map(_parse_function)
def _parse_function(example_proto):
features = {'feat/division': tf.FixedLenFeature((), tf.string, default_value=""),
'label/sporder': tf.FixedLenFeature((), tf.int64, default_value=0),
'feat/puma': tf.VarLenFeature(dtype=tf.string),
'feat/region': tf.VarLenFeature(dtype=tf.string)}
parsed_example = tf.parse_single_example(example_proto, features)
parsed_label = parsed_example.pop("label/sporder", None)
return parsed_example, parsed_label
The tfrecord
format uses protobuf for serialization of the struct. tfrecord
格式使用protobuf对结构进行序列化。 You can think about it as a binary json/xml format. 您可以将其视为二进制json / xml格式。 Json/xml and protobuf don't care about the order of the fields.
Json / xml和protobuf并不关心字段的顺序。 So, the order of the feature definitions is not important.
因此,特征定义的顺序并不重要。 It's the same in your snippet because it was just convenient for reading.
您的摘要中的内容相同,因为它很方便阅读。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.