[英]ValueError: Data cardinality is ambiguous with tf.keras
I have a dataframe with two columns;我有一个带有两列的 dataframe; the first includes a sentence and the second is a target label (9 in total - sentence can be classified to more than one label).第一个包括一个句子,第二个是目标 label(总共 9 个 - 句子可以归类到多个标签)。
I have used word2vec to vectorise the text and thats resulted in an array with length 64.我已经使用 word2vec 对文本进行矢量化,这导致了一个长度为 64 的数组。
The initial problem I had我最初遇到的问题
Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
To overcome this I have converted the np.array to为了克服这个问题,我将 np.array 转换为
train_inputs = tf.convert_to_tensor([df_train_title_train])
But now I am getting a new problem - see below.但现在我遇到了一个新问题——见下文。
I have been researching stackflow and other sources for days and am struggling to get my simple neural network to work.几天来我一直在研究 stackflow 和其他资源,并且正在努力让我的简单神经网络工作。
print(train_inputs.shape)
print(train_targets.shape)
print(validation_inputs.shape)
print(validation_targets.shape)
print(train_inputs[0].shape)
print(train_targets[0].shape)
(1, 63586, 64)
(63586, 9)
(1, 7066, 64)
(7066, 9)
(63586, 64)
(9,)
# Set the input and output sizes
input_size = 64
output_size = 9
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 64
# define how the model will look like
model = tf.keras.Sequential([
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
# model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
### Training
# That's where we train the model we have built.
# set the batch size
batch_size = 10
# set a maximum number of training epochs
max_epochs = 10
# fit the model
# note that this time the train, validation and test data are not iterable
model.fit(train_inputs, # train inputs
train_targets, # train targets
batch_size=batch_size, # batch size
epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
validation_data=(validation_inputs, validation_targets), # validation data
verbose = 2 # making sure we get enough information about the training process
)
Error Message错误信息
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/data_adapter.py in _check_data_cardinality(data)
1527 label, ", ".join(str(i.shape[0]) for i in nest.flatten(single_data)))
1528 msg += "Make sure all arrays contain the same number of samples."
-> 1529 raise ValueError(msg)
1530
1531
ValueError: Data cardinality is ambiguous:
x sizes: 1
y sizes: 63586
Make sure all arrays contain the same number of samples.
You do not set the shape of your input anywhere;您不会在任何地方设置输入的形状; you should do this either with an explicit Input
layer in the beginning of your model (see the example in the docs ):您应该在 model 的开头使用显式Input
层来执行此操作(请参阅文档中的示例):
# before the first Dense layer:
tf.keras.Input(shape=(64,))
or by including an input_shape
argument in your first layer:或通过在第一层中包含input_shape
参数:
tf.keras.layers.Dense(hidden_layer_size, activation='relu', input_shape=(64,)), # 1st hidden layer
Most probably, you will not even need convert_to_tensor
(not quite sure though).很可能,您甚至不需要convert_to_tensor
(虽然不太确定)。
Also, irrelevant to your issue, but since you are in a multi-class setting, you should use loss='categorical_crossentropy'
, and not binary_crossentropy
;此外,与您的问题无关,但由于您处于多类设置中,您应该使用loss='categorical_crossentropy'
,而不是binary_crossentropy
; see Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?请参阅为什么 binary_crossentropy 和 categorical_crossentropy 对同一问题给出不同的性能?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.