繁体   English   中英

合并两个 TensorFlow 数据集

[英]Combine two Tensorflow Datasets

我有两个 Tensorflow 数据集,我分别处理它们以获得不同的特征和目标窗口:

window_size_x = 3
window_size_y = 2
shift_size = 1

x = np.arange(10)
y = x * 10

x = x[:-window_size_y]
y = y[window_size_x:]

ds_x = tf.data.Dataset.from_tensor_slices(x).window(window_size_x, shift=shift_size, drop_remainder=True)
ds_y = tf.data.Dataset.from_tensor_slices(y).window(window_size_y, shift=shift_size, drop_remainder=True)

for i, j in zip(ds_x, ds_y):
  print(list(i.as_numpy_iterator()), list(j.as_numpy_iterator()))

输出:

[0, 1, 2] [30, 40]
[1, 2, 3] [40, 50]
[2, 3, 4] [50, 60]
[3, 4, 5] [60, 70]
[4, 5, 6] [70, 80]
[5, 6, 7] [80, 90]

当我最终使用model.fit(ds_x, ds_y)将这两个数据集输入模型时,我收到以下错误:

ValueError: `y` argument is not supported when using dataset as input.

当我尝试像在这个答案中那样组合两个数据集时,我得到另一个错误:

ds_all = tf.data.Dataset.from_tensor_slices((ds_x, ds_y))

错误:

ValueError: Slicing dataset elements is not supported for rank 0.

组合两个数据集的正确方法是什么?

使用tf.data.Dataset.zip组合特征和标签。

ds_all = tf.data.Dataset.from_tensor_slices(*tf.data.Dataset.zip(
                                               (ds_x.batch(BATCH_SIZE),
                                                ds_y.batch(BATCH_SIZE))
                                            ))

也许尝试这样的事情:

import tensorflow as tf
import numpy as np

window_size_x = 3
window_size_y = 2
shift_size = 1

x = np.arange(10)
y = x * 10

x = x[:-window_size_y]
y = y[window_size_x:]

ds_x = tf.data.Dataset.from_tensor_slices(x).window(window_size_x, shift=shift_size, drop_remainder=True).flat_map(lambda x: x.batch(window_size_x))
ds_y = tf.data.Dataset.from_tensor_slices(y).window(window_size_y, shift=shift_size, drop_remainder=True).flat_map(lambda x: x.batch(window_size_y))
dataset = tf.data.Dataset.zip((ds_x, ds_y))
for i, j in dataset:
  print(i, j)

然后,您可以将dataset直接提供给model.fit(*)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM