[英]Tensorflow keras Matrix size-incompatible with extremely simple model
我正在尝试使用 keras 编写线性和逻辑模型并使用相同的数据进行训练,但遇到了这个令人困惑的错误。 以下是代码和错误消息。
import tensorflow as tf
from tensorflow import keras as tfk
import pandas as pd
def build_model(n_features, **kwargs):
model = tfk.models.Sequential([
tfk.layers.Dense(1, input_shape=[n_features, ], **kwargs)
])
optimizer = tfk.optimizers.SGD()
model.compile(loss=model, optimizer=optimizer, metrics=[tfk.metrics.binary_accuracy])
return model
if __name__ == '__main__':
data = get_data()
train_x, train_y, test_x, test_y = process_data(data)
class PrintDot(tfk.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0:
print('')
print('.', end='')
EPOCHS = 1000
BATCH_SIZE = None
d = len(train_x.keys())
linear = build_model(d)
sigmoid = build_model(d, activation=tfk.activations.sigmoid)
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)
print(linear.summary())
print(sigmoid.summary())
linear_res = linear.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
sigmoid_res = sigmoid.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
loss_linear, acc_linear = linear.evaluate(test_x, test_y, verbose=0)
loss_sigmoid, acc_sigmoid = sigmoid.evaluate(test_x, test_y, verbose=0)
print("""
Linear: loss = {.2f} accuracy = {.2f}
Logistic: loss = {.2f} accuracy = {.2f}
""".format(loss_linear, acc_linear, loss_sigmoid, acc_sigmoid))
这是数据和模型摘要的形状,这似乎根本没有错。
(736, 15)
(736,)
(184, 15)
(184,)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 16
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
None
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1) 16
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
None
这产生了一个错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [32,1], In[1]: [15,1]
[[{{node loss/dense_loss/sequential/dense/MatMul}}]]
[[{{node ConstantFoldingCtrl/loss/dense_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0}}]]
我认为 32 是默认批量大小,15 是我的数据的维度/#columns,但为什么会有 [15, 1] 的数组?
以下是来自 tensorflow 的详细错误消息:
2019-07-09 14:47:57.381250: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-09 14:47:57.636045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2019-07-09 14:47:57.636491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-09 14:47:58.354913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-09 14:47:58.355175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-07-09 14:47:58.355332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-07-09 14:47:58.355663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4716 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-07-09 14:47:58.953351: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-07-09 14:47:59.396889: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.397449: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.399714: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.402435: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.402714: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:/Users/charl/PycharmProjects/cs229_models/keras_logistic_regression.py", line 168, in <module>
callbacks=[PrintDot()])
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py", line 880, in fit
validation_steps=validation_steps)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [32,1], In[1]: [15,1]
[[{{node loss/dense_loss/sequential/dense/MatMul}}]]
[[{{node ConstantFoldingCtrl/loss/dense_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0}}]]
也许我误解了一些东西,但是你在编译时编写model.compile(loss=model...)
的原因是什么? 我的理解是, keras
中的损失函数总是需要loss_function(y_true, y_pred)
形式的输入。
正如您所说,我希望[32,1]
是单批train_y
数据的形状,而[15,1]
是模型(您用作损失函数)的输入形状期望,因此不兼容错误。
指定process_data(data)
作用可能也有帮助。
我无法运行loss=model
的代码,但我尝试用类似的代码重现您的问题,您可以在此处的 colab 中查看: https ://drive.google.com/open ? id = 1MWLMpPUBKorRdMCa3ekK50AnEVH9Vtyc
!pip install tensorflow-gpu==1.14.0
import tensorflow as tf
import numpy as np
import tensorflow as tf
from tensorflow import keras as tfk
import pandas as pd
print(tf.__version__)
def build_model(n_features, **kwargs):
model = tfk.models.Sequential([
tfk.layers.Dense(1, input_shape=[n_features, ], **kwargs)
])
optimizer = tfk.optimizers.SGD()
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=[tfk.metrics.binary_accuracy])
return model
train_x = np.random.rand(736, 15)
train_y = np.random.rand(736,)
class PrintDot(tfk.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0:
print('')
print('.', end='')
EPOCHS = 1000
BATCH_SIZE = None
d = train_x.shape[1]
linear = build_model(d)
sigmoid = build_model(d, activation=tfk.activations.sigmoid)
print(train_x.shape)
print(train_y.shape)
print(linear.summary())
print(sigmoid.summary())
linear_res = linear.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
sigmoid_res = sigmoid.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
这按预期工作,并且训练运行没有错误。 与您的代码的主要区别是我使用了loss='mean_squared_error'
并创建了虚拟数据
train_x = np.random.rand(736, 15)
train_y = np.random.rand(736,)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.