[英]Tensorflow keras Matrix size-incompatible with extremely simple model
I'm trying to code linear and logistic models with keras and training with on the same data, but encounter this confusing error.我正在尝试使用 keras 编写线性和逻辑模型并使用相同的数据进行训练,但遇到了这个令人困惑的错误。 Here are code and error messages.以下是代码和错误消息。
import tensorflow as tf
from tensorflow import keras as tfk
import pandas as pd
def build_model(n_features, **kwargs):
model = tfk.models.Sequential([
tfk.layers.Dense(1, input_shape=[n_features, ], **kwargs)
])
optimizer = tfk.optimizers.SGD()
model.compile(loss=model, optimizer=optimizer, metrics=[tfk.metrics.binary_accuracy])
return model
if __name__ == '__main__':
data = get_data()
train_x, train_y, test_x, test_y = process_data(data)
class PrintDot(tfk.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0:
print('')
print('.', end='')
EPOCHS = 1000
BATCH_SIZE = None
d = len(train_x.keys())
linear = build_model(d)
sigmoid = build_model(d, activation=tfk.activations.sigmoid)
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)
print(linear.summary())
print(sigmoid.summary())
linear_res = linear.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
sigmoid_res = sigmoid.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
loss_linear, acc_linear = linear.evaluate(test_x, test_y, verbose=0)
loss_sigmoid, acc_sigmoid = sigmoid.evaluate(test_x, test_y, verbose=0)
print("""
Linear: loss = {.2f} accuracy = {.2f}
Logistic: loss = {.2f} accuracy = {.2f}
""".format(loss_linear, acc_linear, loss_sigmoid, acc_sigmoid))
and here are shapes of data and model summaries, which doesn't seem wrong at all.这是数据和模型摘要的形状,这似乎根本没有错。
(736, 15)
(736,)
(184, 15)
(184,)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 16
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
None
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1) 16
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
None
This produced an error:这产生了一个错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [32,1], In[1]: [15,1]
[[{{node loss/dense_loss/sequential/dense/MatMul}}]]
[[{{node ConstantFoldingCtrl/loss/dense_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0}}]]
I think 32 is the default batch size and 15 is the dimension/#columns of my data, but why would there be even an array of [15, 1]?我认为 32 是默认批量大小,15 是我的数据的维度/#columns,但为什么会有 [15, 1] 的数组?
Here is the detailed error messages from tensorflow:以下是来自 tensorflow 的详细错误消息:
2019-07-09 14:47:57.381250: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-09 14:47:57.636045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2019-07-09 14:47:57.636491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-09 14:47:58.354913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-09 14:47:58.355175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-07-09 14:47:58.355332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-07-09 14:47:58.355663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4716 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-07-09 14:47:58.953351: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-07-09 14:47:59.396889: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.397449: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.399714: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.402435: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-07-09 14:47:59.402714: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:/Users/charl/PycharmProjects/cs229_models/keras_logistic_regression.py", line 168, in <module>
callbacks=[PrintDot()])
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py", line 880, in fit
validation_steps=validation_steps)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "C:\Users\charl\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [32,1], In[1]: [15,1]
[[{{node loss/dense_loss/sequential/dense/MatMul}}]]
[[{{node ConstantFoldingCtrl/loss/dense_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0}}]]
Maybe I am misunderstanding something, but what is the reason for you writing model.compile(loss=model...)
when compiling?也许我误解了一些东西,但是你在编译时编写model.compile(loss=model...)
的原因是什么? My understanding is that the loss function in keras
always expects input of the form loss_function(y_true, y_pred)
.我的理解是, keras
中的损失函数总是需要loss_function(y_true, y_pred)
形式的输入。
I would expect that [32,1]
is, as you said, the shape of a single batch of your train_y
data, and [15,1]
is the shape of the input the model (which you use as a loss function) would expect, hence the incompatibility error.正如您所说,我希望[32,1]
是单批train_y
数据的形状,而[15,1]
是模型(您用作损失函数)的输入形状期望,因此不兼容错误。
Probably it would also be helpful to specify what process_data(data)
does.指定process_data(data)
作用可能也有帮助。
I could not get the code with loss=model
running, but I tried to reproduce your problem with a similar code, you can check it out in colab here: https://drive.google.com/open?id=1MWLMpPUBKorRdMCa3ekK50AnEVH9Vtyc我无法运行loss=model
的代码,但我尝试用类似的代码重现您的问题,您可以在此处的 colab 中查看: https ://drive.google.com/open ? id = 1MWLMpPUBKorRdMCa3ekK50AnEVH9Vtyc
!pip install tensorflow-gpu==1.14.0
import tensorflow as tf
import numpy as np
import tensorflow as tf
from tensorflow import keras as tfk
import pandas as pd
print(tf.__version__)
def build_model(n_features, **kwargs):
model = tfk.models.Sequential([
tfk.layers.Dense(1, input_shape=[n_features, ], **kwargs)
])
optimizer = tfk.optimizers.SGD()
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=[tfk.metrics.binary_accuracy])
return model
train_x = np.random.rand(736, 15)
train_y = np.random.rand(736,)
class PrintDot(tfk.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0:
print('')
print('.', end='')
EPOCHS = 1000
BATCH_SIZE = None
d = train_x.shape[1]
linear = build_model(d)
sigmoid = build_model(d, activation=tfk.activations.sigmoid)
print(train_x.shape)
print(train_y.shape)
print(linear.summary())
print(sigmoid.summary())
linear_res = linear.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
sigmoid_res = sigmoid.fit(
train_x, train_y, batch_size=BATCH_SIZE,
epochs=EPOCHS, validation_split=0.2, verbose=0,
callbacks=[PrintDot()])
This works as expected, and the training runs without errors.这按预期工作,并且训练运行没有错误。 The main differences to your code are that I used loss='mean_squared_error'
and created dummy data with与您的代码的主要区别是我使用了loss='mean_squared_error'
并创建了虚拟数据
train_x = np.random.rand(736, 15)
train_y = np.random.rand(736,)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.