简体   繁体   English

TensorFlow - 交错多个独立预处理的 TFRecord 文件

[英]TensorFlow - Interleave multiple indipently preprocessed TFRecord files

I have multiple TFRecord files from the Waymo Dataset, each containing consecutive points that are not consecutive across files.我有来自 Waymo 数据集的多个TFRecord文件,每个文件都包含在文件中不连续的连续点。 I'm building an input pipeline that preprocesses data for time series prediction via the window() API but I need to avoid the window to span accross multiple files.我正在构建一个输入管道,通过window() API 为时间序列预测预处理数据,但我需要避免 window 跨越多个文件。

To do so, I believe I should preprocess each file indipentently and interleave the final datasets.为此,我相信我应该单独预处理每个文件并交错最终数据集。 Here's my attempt:这是我的尝试:

import tensorflow as tf
from waymo_open_dataset import dataset_pb2 as open_dataset #for parsing Waymo frames

filenames = [os.path.join(DATASET_DIR, f) for f in os.listdir(DATASET_DIR)]
dataset = tf.data.TFRecordDataset(filenames, compression_type='')

def interleave_fn(filename):
    ds = filename.map(lambda x: tf.py_function(_parse_data, [x], [tf.float32]*N_FEATURES,), 
                          num_parallel_calls=tf.data.experimental.AUTOTUNE) 
    ds = ds.map(_concatenate_tensors).map(_set_x_shape)
    ds = build_x_dataset(ds)
    return ds

def _parse_data(data):
    # Parse feature from Waymo dataset  
    frame = open_dataset.Frame()
    frame.ParseFromString(bytearray(data.numpy()))   
    av_v_x = frame.images[0].velocity.v_x 
    av_v_y = frame.images[0].velocity.v_y 
    return av_v_x, av_v_y

def _concatenate_tensors(*x):
    #Concatenate tensor tuple in a single tensor
    return tf.stack((x))

def _set_x_shape(x):
    #Set X dataset shape. If not UNDEFINED RANK ValueError
    x.set_shape((N_FEATURES,))
    return x
    
def build_x_dataset(ds_x, window = WINDOW):
    # Extract sequences for time series prediction training
    # Selects a sliding window of WINDOW samples, shifting by 1 sample at a time
    ds_x = ds_x.window(size = window, shift = 1, drop_remainder = True)
    
    # Each element of `ds_x` is a nested dataset containing WINDOWconsecutive examples 
    ds_x = ds_x.map(lambda d: tf.data.experimental.get_single_element(d.batch(window))) 
    return ds_x

dataset = dataset.interleave(interleave_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE)

This returns这返回

AttributeError: in user code:

    /tmp/xpython_26752/494049692.py:118 interleave_fn  *
        ds = filename.map(lambda x: tf.py_function(_parse_data, [x], [tf.float32]*N_FEATURES,),

    AttributeError: 'Tensor' object has no attribute 'map'

which makes sense because print(filename) in interleave_fn gives这是有道理的,因为interleave_fn中的print(filename)给出了

Tensor("args_0:0", shape=(), dtype=string)

I thought the interleave_fn would be applied to each TFRecordDataset , so filename would be a dataset itself instead of a tensor.我认为interleave_fn将应用于每个TFRecordDataset ,因此filename名将是数据集本身而不是张量。 What's wrong here?这里有什么问题? Thank you!谢谢!

Solved it by looping over all TFRecord files and appending the corresponding datasets to a dataset list.通过遍历所有 TFRecord 文件并将相应的数据集附加到数据集列表来解决它。 Then, following this tip to interleave all the preprocessed datasets.然后,按照此提示交错所有预处理的数据集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python tensorflow 创建具有多个数组特征的 tfrecord - Python tensorflow creating tfrecord with multiple array features 尝试从多个tfrecord文件读取数据 - Trying to read data from multiple tfrecord files TensorFlow:有没有办法找到编码到TFRecord文件中的图像的文件名? - TensorFlow: Is there a way to locate the filenames of images encoded into TFRecord files? Tensorflow Object Detection API 提供 0 字节大小的 Tfrecord 文件? - Tensorflow Object Detection API gives Tfrecord files of 0 bytes size? Tensorflow可以混洗多个分片的TFrecord二进制文件以进行对象检测训练吗? - Can Tensorflow shuffle multiple sharded TFrecord binaries for object detection training? Tensorflow MNIST TFRecord - Tensorflow MNIST TFRecord 从每个文件一次交错多个文件,连续两行或多行 - interleave multiple files with 2 or more consecutive lines at a time from each file 使用 Tensorflow Interleave 提高性能 - Using Tensorflow Interleave to Improve Performance Tensorflow-使用parallel_interleave从多个tfrecord中读取不同的block_lengths? - Tensorflow - Read different block_lengths from multiple tfrecords with parallel_interleave? Tensorflow:对 TFRecord 文件的预处理是否比实时数据预处理更快? - Tensorflow: Is preprocessing on TFRecord files faster than real-time data preprocessing?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM