如何创建时间序列滑动 window tensorflow 数据集，其中某些特征的批量大小与其他特征不同？

Question

Currently I am able to create a timeseries sliding window batched dataset that contains ordered 'feature sets' like 'inputs', 'targets', 'benchmarks', etc. Originally I had developed my model and dataset wherein the targets would be of the same batch size as all other inputs, however that has proven to be detrimental to tuning the input batch size and also won't be helpful when it comes time to run this on live data where I only care to produce a single sample output of the shape (1, horizon, targets) or perhaps just (horizon, targets) given an input dataset of (samples, horizon, features) .目前我能够创建一个时间序列滑动 window 批处理数据集，其中包含有序的“特征集”，如“输入”、“目标”、“基准”等。最初我开发了我的 model 和数据集，其中目标是相同的与所有其他输入一样的批量大小，然而，这已被证明对调整输入批量大小有害，并且在需要对实时数据运行它时也无济于事，我只关心生成形状的单个样本 output (1, horizon, targets)或者可能只是(horizon, targets)给定输入数据集(samples, horizon, features) 。

As an overview, I want to take N historical samples of horizon length features at time T , run them through the model and output a single sample of horizon length targets ;作为一个概述，我想在时间T获取N个horizon长度features的历史样本，通过 model 和 output 单个horizon长度targets样本运行它们； repeat until the dataset is run through in its entirety.重复直到整个数据集运行完毕。

Assuming a pandas DataFrame of length Z , all resulting Datasets should have a length of Z - horizon .假设长度为Z的 pandas DataFrame，所有生成的数据集的长度应为Z - horizon 。 The 'targets' Dataset should have a batch size of 1, and the 'inputs' Dataset should have a batch size of batch_size . “目标”数据集的批量大小应为 1，“输入”数据集的批量大小应为batch_size 。

Here's a stripped down snippet of what I currently use in order to generate a standard batch size for all feature sets:这是我目前用来为所有功能集生成标准批量大小的精简片段：

import tensorflow as tf
import pandas as pd

horizon = 5
batch_size = 10
columns = {
    "inputs": ["input_1", "input_2"],
    "targets": ["target_1"],
}
batch_options = {
    "drop_remainder": True,
    "deterministic": True,
}

d = range(100)
df = pd.DataFrame(data={'input_1': d, 'input_2': d, 'target_1': d})

slices = tuple(df[x].astype("float32") for x in columns.values())
data = (
    tf.data.Dataset.from_tensor_slices(slices)
    .window(horizon, shift=1, drop_remainder=True)
    .flat_map(
        lambda *c: tf.data.Dataset.zip(
            tuple(
                col.batch(horizon, **batch_options)
                for col in c
            )
        )
    )
    .batch(
        batch_size,
        **batch_options,
    )
)

Answer 1

We can create two sliding windowed dataset and zip them.我们可以创建两个滑动窗口数据集和 zip 它们。

inputs = df[['input_1', 'input_1']].to_numpy()
labels = df['target_1'].to_numpy()


window_size = 10
stride =1
data1 = tf.data.Dataset.from_tensor_slices(inputs).window(window_size, shift=stride, drop_remainder=True).flat_map(lambda x: x.batch(window_size))
data2 = tf.data.Dataset.from_tensor_slices(inputs).window(1, shift=stride, drop_remainder=True).flat_map(lambda x: x.batch(1))
data = tf.data.Dataset.zip((data1, data2))

如何创建时间序列滑动 window tensorflow 数据集，其中某些特征的批量大小与其他特征不同？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-11-24 15:39:00

如何创建时间序列滑动 window tensorflow 数据集，其中某些特征的批量大小与其他特征不同？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-11-24 15:39:00

解决方案1
0 已采纳 2022-11-24 15:39:00