[英]TensorFlow - Interleave multiple indipently preprocessed TFRecord files
I have multiple TFRecord
files from the Waymo Dataset, each containing consecutive points that are not consecutive across files.我有来自 Waymo 数据集的多个
TFRecord
文件,每个文件都包含在文件中不连续的连续点。 I'm building an input pipeline that preprocesses data for time series prediction via the window()
API but I need to avoid the window to span accross multiple files.我正在构建一个输入管道,通过
window()
API 为时间序列预测预处理数据,但我需要避免 window 跨越多个文件。
To do so, I believe I should preprocess each file indipentently and interleave the final datasets.为此,我相信我应该单独预处理每个文件并交错最终数据集。 Here's my attempt:
这是我的尝试:
import tensorflow as tf
from waymo_open_dataset import dataset_pb2 as open_dataset #for parsing Waymo frames
filenames = [os.path.join(DATASET_DIR, f) for f in os.listdir(DATASET_DIR)]
dataset = tf.data.TFRecordDataset(filenames, compression_type='')
def interleave_fn(filename):
ds = filename.map(lambda x: tf.py_function(_parse_data, [x], [tf.float32]*N_FEATURES,),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds = ds.map(_concatenate_tensors).map(_set_x_shape)
ds = build_x_dataset(ds)
return ds
def _parse_data(data):
# Parse feature from Waymo dataset
frame = open_dataset.Frame()
frame.ParseFromString(bytearray(data.numpy()))
av_v_x = frame.images[0].velocity.v_x
av_v_y = frame.images[0].velocity.v_y
return av_v_x, av_v_y
def _concatenate_tensors(*x):
#Concatenate tensor tuple in a single tensor
return tf.stack((x))
def _set_x_shape(x):
#Set X dataset shape. If not UNDEFINED RANK ValueError
x.set_shape((N_FEATURES,))
return x
def build_x_dataset(ds_x, window = WINDOW):
# Extract sequences for time series prediction training
# Selects a sliding window of WINDOW samples, shifting by 1 sample at a time
ds_x = ds_x.window(size = window, shift = 1, drop_remainder = True)
# Each element of `ds_x` is a nested dataset containing WINDOWconsecutive examples
ds_x = ds_x.map(lambda d: tf.data.experimental.get_single_element(d.batch(window)))
return ds_x
dataset = dataset.interleave(interleave_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)
This returns这返回
AttributeError: in user code:
/tmp/xpython_26752/494049692.py:118 interleave_fn *
ds = filename.map(lambda x: tf.py_function(_parse_data, [x], [tf.float32]*N_FEATURES,),
AttributeError: 'Tensor' object has no attribute 'map'
which makes sense because print(filename)
in interleave_fn
gives这是有道理的,因为
interleave_fn
中的print(filename)
给出了
Tensor("args_0:0", shape=(), dtype=string)
I thought the interleave_fn
would be applied to each TFRecordDataset
, so filename
would be a dataset itself instead of a tensor.我认为
interleave_fn
将应用于每个TFRecordDataset
,因此filename
名将是数据集本身而不是张量。 What's wrong here?这里有什么问题? Thank you!
谢谢!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.