如何将validation_data 传递给Model.fit + Dataset？

Question

I am trying to train a simple Sequential network on generated data.我正在尝试在生成的数据上训练一个简单的Sequential网络。 I have a precomputed validation dataset.我有一个预先计算的验证数据集。

To feed the inputs, I am using tf.data.Dataset API like suggested here: https://stackoverflow.com/a/48134765/231238为了提供输入，我使用tf.data.Dataset API 就像这里建议的那样： https : tf.data.Dataset

var train = Dataset.from_tensor_slices(ValueTuple.Create(trainInputs, trainOutputs));
train = train
    .repeat(2000000)
    .shuffle(buffer_size: 1024 * 8 * InterpolateBy)
    .batch(1024);
model.fit_dyn(train,
    epochs: 6*1024,
    steps_per_epoch: 4
    // line below does not work:
    , validation_data: (testInputs, testOutputs)
);

It works fine without validation_data .它在没有validation_data情况下工作正常。

If I pass validation_data as a tuple of tensors, like in the example above, eg (testInputs, testOutputs) , it throws 'TypeError : float() argument must be a string or a number, not 'NoneType' .如果我将validation_data作为张量元组传递，就像上面的例子一样，例如(testInputs, testOutputs) ，它会抛出'TypeError : float() argument must be a string or a number, not 'NoneType' 。 (This is what I used to do with train data too before switching to Dataset , and validation worked) （这也是我在切换到Dataset之前对训练数据所做的，并且验证有效）

If I wrap testInputs and testOutputs into a Dataset similarly to the train data, eg Dataset.from_tensor_slices(ValueTuple.Create(testInputs, testOutputs))如果我将testInputs和testOutputs包装成类似于训练数据的数据Dataset ，例如Dataset.from_tensor_slices(ValueTuple.Create(testInputs, testOutputs))

I get a different error: ValueError : Error when checking input: expected sequential_input to have 2 dimensions, but got array with shape (347,) .我得到一个不同的错误： ValueError : Error when checking input: expected sequential_input to have 2 dimensions, but got array with shape (347,) 。

Here 347 is the size of feature vector, hence testInputs.shape is (221, 347) and testOutputs.shape is (221, 1)这里 347 是特征向量的大小，因此testInputs.shape是 (221, 347) 而testOutputs.shape是 (221, 1)

Answer 1

From our discussion, we can clarify some of the things.从我们的讨论中，我们可以澄清一些事情。

First off, not very sure about the error when directly feeding it as a tuple.首先，当直接将其作为元组提供时，不太确定错误。 Might be needing more information about the data for it.可能需要有关其数据的更多信息。

As far as feeding validation with tf data, when we use from_tensor_slices , "we create a dataset whose elements are slices of given tensors".至于使用 tf 数据进行验证，当我们使用from_tensor_slices 时，“我们创建了一个数据集，其元素是给定张量的切片”。 With respect to this example, the input we are feeding is a tuple with respective shapes (221,347) and (221,1).对于这个例子，我们提供的输入是一个具有各自形状 (221,347) 和 (221,1) 的元组。 What from_tensor_slices does is that it slices the respective numpy arrays along the 0th dimension (which is of size 221 here). from_tensor_slices 的作用是沿第 0 维（此处大小为 221）对相应的 numpy 数组进行切片。 The method will thus create a dataset, where each element is a tuple of shape (347,) and (1,) respectively.因此，该方法将创建一个数据集，其中每个元素分别是一个形状为 (347,) 和 (1,) 的元组。 There will be 221 such elements in the dataset.数据集中将有 221 个这样的元素。

If we use the from_tensors method on the other hand, it creates a dataset with a single element, which comprise of the given tensors as input.另一方面，如果我们使用from_tensors方法，它会创建一个具有单个元素的数据集，其中包含给定的张量作为输入。 So it is equivalent to directly feeding the numpy data as it is through a dataset object.所以它相当于直接通过数据集对象直接馈送 numpy 数据。

Here is a brief example of how this works for a much smaller dimension:下面是一个简单的例子，说明它如何适用于更小的维度：

import numpy as np
import tensorflow as tf
np.random.seed(42)
example_train = np.random.randn(4, 4)
example_test = np.random.randn(4, 1)

print("Example Train:", example_train)
print("Example Test:", example_test)

dataset1 = tf.data.Dataset.from_tensor_slices((example_train, example_test))
dataset2 = tf.data.Dataset.from_tensors((example_train, example_test))

it1 = dataset1.make_one_shot_iterator().get_next()
it2 = dataset2.make_one_shot_iterator().get_next()

with tf.Session() as sess:
    for i in range(4):
        print("Element {} of dataset1: {}".format(i,sess.run([it1])))
    print ("Element 0 of dataset2: ", sess.run([it2]))

Result:结果：

Example Train: [[ 0.49671415 -0.1382643   0.64768854  1.52302986]
 [-0.23415337 -0.23413696  1.57921282  0.76743473]
 [-0.46947439  0.54256004 -0.46341769 -0.46572975]
 [ 0.24196227 -1.91328024 -1.72491783 -0.56228753]]
Example Test: [[-1.01283112]
 [ 0.31424733]
 [-0.90802408]
 [-1.4123037 ]]
Element 0 of dataset1: [(array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986]), array([-1.01283112]))]
Element 1 of dataset1: [(array([-0.23415337, -0.23413696,  1.57921282,  0.76743473]), array([0.31424733]))]
Element 2 of dataset1: [(array([-0.46947439,  0.54256004, -0.46341769, -0.46572975]), array([-0.90802408]))]
Element 3 of dataset1: [(array([ 0.24196227, -1.91328024, -1.72491783, -0.56228753]), array([-1.4123037]))]
Element 0 of dataset2:  [(array([[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986],
       [-0.23415337, -0.23413696,  1.57921282,  0.76743473],
       [-0.46947439,  0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783, -0.56228753]]), array([[-1.01283112],
       [ 0.31424733],
       [-0.90802408],
       [-1.4123037 ]]))]

Regarding my comments about the batch method, by setting the batch_size to be 221 for putting things back together, if we change the dataset1 code to something like this and modify our printing to something like this for example:关于我对批处理方法的评论，通过将 batch_size 设置为 221 以将事物重新组合在一起，如果我们将 dataset1 代码更改为这样的内容并将我们的打印修改为这样的内容，例如：

dataset1 = tf.data.Dataset.from_tensor_slices((example_train, example_test)).batch(4)

with tf.Session() as sess:
    print ("Element 0 of dataset1: ", sess.run([it1]))

Our result:我们的结果：

Element 0 of dataset1:  [(array([[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986],
       [-0.23415337, -0.23413696,  1.57921282,  0.76743473],
       [-0.46947439,  0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227, -1.91328024, -1.72491783, -0.56228753]]), array([[-1.01283112],
       [ 0.31424733],
       [-0.90802408],
       [-1.4123037 ]]))]

which you can see is the same as using from_tensors.您可以看到与使用 from_tensors 相同。

如何将validation_data 传递给Model.fit + Dataset？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-06 20:58:49

如何将validation_data 传递给Model.fit + Dataset？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-06 20:58:49

解决方案1
1 已采纳 2019-02-06 20:58:49