[英]Use of tf.data.Dataset with Keras input layer on Tensorflow 2.0
I'm experimenting with TensorFlow
2.0 alpha and I've found that it works as expected when using Numpy
arrays but when tf.data.Dataset
is used, an input dimension error appears. 我正在使用
TensorFlow
2.0 alpha进行实验,发现使用Numpy
数组时它可以按预期工作,但是使用tf.data.Dataset
会出现输入尺寸错误。 I'm using the iris dataset as the simplest example to demonstrate this: 我使用虹膜数据集作为最简单的示例来演示这一点:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import tensorflow as tf
from tensorflow.python import keras
iris = datasets.load_iris()
scl = StandardScaler()
ohe = OneHotEncoder(categories='auto')
data_norm = scl.fit_transform(iris.data)
data_target = ohe.fit_transform(iris.target.reshape(-1,1)).toarray()
train_data, val_data, train_target, val_target = train_test_split(data_norm, data_target, test_size=0.1)
train_data, test_data, train_target, test_target = train_test_split(train_data, train_target, test_size=0.2)
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset.batch(32)
test_dataset = tf.data.Dataset.from_tensor_slices((test_data, test_target))
test_dataset.batch(32)
val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset.batch(32)
mdl = keras.Sequential([
keras.layers.Dense(16, input_dim=4, activation='relu'),
keras.layers.Dense(8, activation='relu'),
keras.layers.Dense(8, activation='relu'),
keras.layers.Dense(3, activation='sigmoid')]
)
mdl.compile(
optimizer=keras.optimizers.Adam(0.01),
loss=keras.losses.categorical_crossentropy,
metrics=[keras.metrics.categorical_accuracy]
)
history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset)
and I get the following error: 我收到以下错误:
ValueError: Error when checking input: expected dense_16_input to have shape (4,) but got array with shape (1,)
assuming that the dataset has only one dimension. 假设数据集只有一个维度。 If I pass input_dim=1 I get a different error:
如果我通过input_dim = 1,则会收到其他错误:
InvalidArgumentError: Incompatible shapes: [3] vs. [4]
[[{{node metrics_5/categorical_accuracy/Equal}}]] [Op:__inference_keras_scratch_graph_8223]
What is the proper way to use tf.data.Dataset
on a Keras
model with Tensorflow 2.0
? 什么是使用正确的方式
tf.data.Dataset
上Keras
模型Tensorflow 2.0
?
A few changes should fix your code. 进行一些更改即可修复您的代码。 The
batch()
dataset transformation does not occur in-place, so you need to return the new datasets. batch()
数据集转换不会就地发生,因此您需要返回新的数据集。 Secondly, you should also add a repeat()
transformation, so that the dataset continues to output examples after all of the data has been seen. 其次,您还应该添加一个
repeat()
转换,以便在看到所有数据之后数据集继续输出示例。
...
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset = train_dataset.batch(32)
train_dataset = train_dataset.repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset = val_dataset.batch(32)
val_dataset = val_dataset.repeat()
...
You also need to add the argument for validation_steps
in the model.fit()
function: 您还需要在
model.fit()
函数中为validation_steps
添加参数:
history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset, validation_steps=1)
For your own data, you may need to adjust the batch_size
for the validation dataset and validation_steps
, such that the validation data is only cycled once during each step. 对于自己的数据,你可能需要调整
batch_size
用于验证数据集和validation_steps
,使得在每一步验证数据仅循环一次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.