![](/img/trans.png)
[英]ValueError: logits and labels must have the same shape ((None, 5) vs (None, 1))
[英]ValueError: 'logits' and 'labels' must have the same shape for NLP sentiment multi-class classifier
我正在嘗試制作一個 NLP 多類情感分類器,它將句子作為輸入並將它們分為三類(消極、中性和積極)。 但是,在訓練 model 時,我遇到了錯誤,我的 logits (None, 3) 與我的標簽 (None, 1) 大小不同,model 無法開始訓練。
我的 model 是一個多類分類器而不是多標簽分類器,因為它只預測每個 object 一個 label。我確保我的最后一層有一個 output 為 3 並且激活 = 'softmax'。 從我在網上搜索到的內容來看,這應該是正確的,所以我認為問題出在我的標簽上。
目前,我的標簽具有 (None, 1) 的維度,因為我將每個 class 映射到一個唯一的 integer 並將其作為我的測試傳遞並訓練 y 值(其形式為一維 numpy 數組。
現在我很困惑,如果我改變了這個數組的維度來匹配 output 維度以及如何 go 來做這件事。
import os
import sys
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from keras.optimizers import SGD
device_name = tf.test.gpu_device_name()
if len(device_name) > 0:
print("Found GPU at: {}".format(device_name))
else:
device_name = "/device:CPU:0"
print("No GPU, using {}.".format(device_name))
# Load dataset into a dataframe
train_data_path = "/content/drive/MyDrive/ML Datasets/tweet_sentiment_analysis/train.csv"
test_data_path = "/content/drive/MyDrive/ML Datasets/tweet_sentiment_analysis/test.csv"
train_df = pd.read_csv(train_data_path, encoding='unicode_escape')
test_df = pd.read_csv(test_data_path, encoding='unicode_escape').dropna()
sentiment_types = ('neutral', 'negative', 'positive')
train_df['sentiment'] = train_df['sentiment'].astype('category')
test_df['sentiment'] = test_df['sentiment'].astype('category')
train_df['sentiment_cat'] = train_df['sentiment'].cat.codes
test_df['sentiment_cat'] = test_df['sentiment'].cat.codes
train_y = np.array(train_df['sentiment_cat'])
test_y = np.array(test_df['sentiment_cat'])
# Function to convert df into a list of strings
def convert_to_list(df, x):
selected_text_list = []
labels = []
for index, row in df.iterrows():
selected_text_list.append(str(row[x]))
labels.append(str(row['sentiment']))
return np.array(selected_text_list), np.array(labels)
train_sentences, train_labels = convert_to_list(train_df, 'selected_text')
test_sentences, test_labels = convert_to_list(test_df, 'text')
# Instantiate tokenizer and create word_index
tokenizer = Tokenizer(num_words=1000, oov_token='<oov>')
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index
# Convert sentences into a sequence
train_sequence = tokenizer.texts_to_sequences(train_sentences)
test_sequence = tokenizer.texts_to_sequences(test_sentences)
# Padding sequences
pad_test_seq = pad_sequences(test_sequence, padding='post')
max_len = pad_test_seq[0].size
pad_train_seq = pad_sequences(train_sequence, padding='post', maxlen=max_len)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(10000, 64, input_length=max_len),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
with tf.device(device_name):
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
num_epochs = 10
with tf.device(device_name):
history = model.fit(pad_train_seq, train_y, epochs=num_epochs, validation_data=(pad_test_seq, test_y), verbose=2)
這是錯誤:
ValueError Traceback (most recent call last)
<ipython-input-28-62f3c6445887> in <module>
2
3 with tf.device(device_name):
----> 4 history = model.fit(pad_train_seq, train_y, epochs=num_epochs, validation_data=(pad_test_seq, test_y), verbose=2)
1 frames
/usr/local/lib/python3.8/dist-packages/keras/engine/training.py in tf__train_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1051, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1040, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1030, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 890, in train_step
loss = self.compute_loss(x, y, y_pred, sample_weight)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 948, in compute_loss
return self.compiled_loss(
File "/usr/local/lib/python3.8/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 139, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 243, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 1930, in binary_crossentropy
backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
File "/usr/local/lib/python3.8/dist-packages/keras/backend.py", line 5283, in binary_crossentropy
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
ValueError: `logits` and `labels` must have the same shape, received ((None, 3) vs (None, 1)).
我的 logits (None, 3) 與我的標簽 (None, 1) 大小不同 我確保我的最后一層的 output 為 3 並且激活 = 'softmax' 我的標簽的尺寸為 (None, 1 ) 因為我將每個 class 映射到一個唯一的 integer
您缺少的關鍵概念是您需要對標簽進行單熱編碼(在為標簽分配整數后 - 見下文)。
因此,在 softmax 之后,您的 model 吐出三個值:每個標簽的可能性有多大。 例如,它可能會說 A 是 0.6,B 是 0.1,而 C 是 0.3。 如果正確答案是 C,那么它需要將該正確答案視為0, 0, 1
。 然后它可以說它對 A 的預測是0.6 - 0 = +0.6
錯誤,B 是0.1 - 0 = +0.1
錯誤,C 是0.3 - 1 = -0.7
錯誤。
從理論上講,您可以將字符串 label 中的 go 直接轉換為 one-hot 編碼。 但似乎 Tensorflow 需要先將標簽編碼為整數,然后再進行單熱編碼。
https://www.tensorflow.org/api_docs/python/tf/keras/layers/CategoryEncoding#examples說使用:
tf.keras.layers.CategoryEncoding(num_tokens=3, output_mode="one_hot")
另請參閱https://stackoverflow.com/a/69791457/841830 (從 2019 年開始獲得更高投票的答案,因此我認為適用於 TensorFlow v1)。 搜索“tensorflow one-hot encoding”會出現大量教程和示例。
這里的問題確實是由於我的標簽形狀與 logits 不同。 Logit 的形狀為 (3),因為它們包含一個浮點數,表示我想要預測的三個類別中的每一個的概率。 標簽最初的形狀是 (1),因為它只包含一個整數。
為了解決這個問題,我使用了 one-hot 編碼,將所有標簽變成 (3) 的形狀,這解決了問題。 使用 keras.utils.to_categorical() function 來這樣做。
sentiment_types = ('negative', 'neutral', 'positive')
train_df['sentiment'] = train_df['sentiment'].astype('category')
test_df['sentiment'] = test_df['sentiment'].astype('category')
# Turning labels from strings to int
train_sentiment_cat = train_df['sentiment'].cat.codes
test_sentiment_cat = test_df['sentiment'].cat.codes
# One-hot encoding
train_y = to_categorical(train_sentiment_cat)
test_y = to_categorical(test_sentiment_cat)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.