如何使用滑动 window 方法为多输出回归创建数据集

Question

我想构建普通的 DNN model，我有 X_train=8000000x7 和 y_train=8000000x2 的大量数据。 如何创建一个包含 100 个数据点的滑动 window 的数据集来馈送 neural.network。

如果我使用以下代码使用自定义数据集，由于数据集很大，我会遇到分配问题。

def data_set(x_data, y_data, num_steps=160):
    X, y = list(), list()
    # Loop of the entire data set
    for i in range(x_data.shape[0]):
        # compute a new (sliding window) index
        end_ix = i + num_steps
        # if index is larger than the size of the dataset, we stop
        if end_ix >= x_data.shape[0]:
            break
        # Get a sequence of data for x
        seq_X = x_data[i:end_ix]
        # Get only the last element of the sequency for y
        seq_y = y_data[end_ix]
        # Append the list with sequencies
        X.append(seq_X)
        y.append(seq_y)
    # Make final arrays
    x_array = np.array(X)
    y_array = np.array(y)
    return x_array, y_array

因此，为了避免这种情况，我可以使用任何数据集生成器和滑动 window 来输入 DNN。

提前致谢

Answer 1

您可以使用dataset.window方法来实现。

dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
stride = 1
dataset = dataset.window(batch_size, shift=batch_size-stride, drop_remainder=True)

如何使用滑动 window 方法为多输出回归创建数据集

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-11-14 20:09:30

如何使用滑动 window 方法为多输出回归创建数据集

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-11-14 20:09:30

解决方案1
1 已采纳 2022-11-14 20:09:30