ValueError：NLP 情感多類分類器的“logits”和“labels”必須具有相同的形狀

Question

我正在嘗試制作一個 NLP 多類情感分類器，它將句子作為輸入並將它們分為三類（消極、中性和積極）。 但是，在訓練 model 時，我遇到了錯誤，我的 logits (None, 3) 與我的標簽 (None, 1) 大小不同，model 無法開始訓練。

我的 model 是一個多類分類器而不是多標簽分類器，因為它只預測每個 object 一個 label。我確保我的最后一層有一個 output 為 3 並且激活 = 'softmax'。 從我在網上搜索到的內容來看，這應該是正確的，所以我認為問題出在我的標簽上。

目前，我的標簽具有 (None, 1) 的維度，因為我將每個 class 映射到一個唯一的 integer 並將其作為我的測試傳遞並訓練 y 值（其形式為一維 numpy 數組。

現在我很困惑，如果我改變了這個數組的維度來匹配 output 維度以及如何 go 來做這件事。

import os
import sys
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from keras.optimizers import SGD

device_name = tf.test.gpu_device_name()
if len(device_name) > 0:
    print("Found GPU at: {}".format(device_name))
else:
    device_name = "/device:CPU:0"
    print("No GPU, using {}.".format(device_name))

# Load dataset into a dataframe
train_data_path = "/content/drive/MyDrive/ML Datasets/tweet_sentiment_analysis/train.csv"
test_data_path = "/content/drive/MyDrive/ML Datasets/tweet_sentiment_analysis/test.csv"

train_df = pd.read_csv(train_data_path, encoding='unicode_escape')
test_df = pd.read_csv(test_data_path, encoding='unicode_escape').dropna()

sentiment_types = ('neutral', 'negative', 'positive')

train_df['sentiment'] = train_df['sentiment'].astype('category')
test_df['sentiment'] = test_df['sentiment'].astype('category')

train_df['sentiment_cat'] = train_df['sentiment'].cat.codes
test_df['sentiment_cat'] = test_df['sentiment'].cat.codes

train_y = np.array(train_df['sentiment_cat'])
test_y = np.array(test_df['sentiment_cat'])

# Function to convert df into a list of strings
def convert_to_list(df, x):
  selected_text_list = []
  labels = []

  for index, row in df.iterrows():
    selected_text_list.append(str(row[x]))
    labels.append(str(row['sentiment']))
  
  return np.array(selected_text_list), np.array(labels)


train_sentences, train_labels = convert_to_list(train_df, 'selected_text')
test_sentences, test_labels = convert_to_list(test_df, 'text')

# Instantiate tokenizer and create word_index
tokenizer = Tokenizer(num_words=1000, oov_token='<oov>')
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index

# Convert sentences into a sequence 
train_sequence = tokenizer.texts_to_sequences(train_sentences)
test_sequence = tokenizer.texts_to_sequences(test_sentences)

# Padding sequences 
pad_test_seq = pad_sequences(test_sequence, padding='post')
max_len = pad_test_seq[0].size
pad_train_seq = pad_sequences(train_sequence, padding='post', maxlen=max_len)

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(10000, 64, input_length=max_len),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')
])

with tf.device(device_name):
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

num_epochs = 10

with tf.device(device_name):
  history = model.fit(pad_train_seq, train_y, epochs=num_epochs, validation_data=(pad_test_seq, test_y), verbose=2)

這是錯誤：

ValueError                                Traceback (most recent call last)
<ipython-input-28-62f3c6445887> in <module>
      2 
      3 with tf.device(device_name):
----> 4   history = model.fit(pad_train_seq, train_y, epochs=num_epochs, validation_data=(pad_test_seq, test_y), verbose=2)

1 frames
/usr/local/lib/python3.8/dist-packages/keras/engine/training.py in tf__train_function(iterator)
     13                 try:
     14                     do_return = True
---> 15                     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16                 except:
     17                     do_return = False

ValueError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 890, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 948, in compute_loss
        return self.compiled_loss(
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 139, in __call__
        losses = call_fn(y_true, y_pred)
    File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 243, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 1930, in binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    File "/usr/local/lib/python3.8/dist-packages/keras/backend.py", line 5283, in binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

    ValueError: `logits` and `labels` must have the same shape, received ((None, 3) vs (None, 1)).

Answer 1

我的 logits (None, 3) 與我的標簽 (None, 1) 大小不同我確保我的最后一層的 output 為 3 並且激活 = 'softmax' 我的標簽的尺寸為 (None, 1 ) 因為我將每個 class 映射到一個唯一的 integer

您缺少的關鍵概念是您需要對標簽進行單熱編碼（在為標簽分配整數后 - 見下文）。

因此，在 softmax 之后，您的 model 吐出三個值：每個標簽的可能性有多大。 例如，它可能會說 A 是 0.6，B 是 0.1，而 C 是 0.3。 如果正確答案是 C，那么它需要將該正確答案視為0, 0, 1 。 然后它可以說它對 A 的預測是0.6 - 0 = +0.6錯誤，B 是0.1 - 0 = +0.1錯誤，C 是0.3 - 1 = -0.7錯誤。

從理論上講，您可以將字符串 label 中的 go 直接轉換為 one-hot 編碼。 但似乎 Tensorflow 需要先將標簽編碼為整數，然后再進行單熱編碼。

https://www.tensorflow.org/api_docs/python/tf/keras/layers/CategoryEncoding#examples說使用：

tf.keras.layers.CategoryEncoding(num_tokens=3, output_mode="one_hot")

另請參閱https://stackoverflow.com/a/69791457/841830 （從 2019 年開始獲得更高投票的答案，因此我認為適用於 TensorFlow v1）。 搜索“tensorflow one-hot encoding”會出現大量教程和示例。

Answer 2

這里的問題確實是由於我的標簽形狀與 logits 不同。 Logit 的形狀為 (3)，因為它們包含一個浮點數，表示我想要預測的三個類別中的每一個的概率。 標簽最初的形狀是 (1)，因為它只包含一個整數。

為了解決這個問題，我使用了 one-hot 編碼，將所有標簽變成 (3) 的形狀，這解決了問題。 使用 keras.utils.to_categorical() function 來這樣做。

sentiment_types = ('negative', 'neutral', 'positive')

train_df['sentiment'] = train_df['sentiment'].astype('category')
test_df['sentiment'] = test_df['sentiment'].astype('category')

# Turning labels from strings to int
train_sentiment_cat = train_df['sentiment'].cat.codes
test_sentiment_cat = test_df['sentiment'].cat.codes

# One-hot encoding 
train_y = to_categorical(train_sentiment_cat)
test_y = to_categorical(test_sentiment_cat)

ValueError：NLP 情感多類分類器的“logits”和“labels”必須具有相同的形狀

問題描述

2 個解決方案

解決方案1
1 2023-02-01 08:37:32

解決方案2
0 2023-02-02 02:19:19

ValueError：NLP 情感多類分類器的“logits”和“labels”必須具有相同的形狀

問題描述

2 個解決方案

解決方案1 1 2023-02-01 08:37:32

解決方案2 0 2023-02-02 02:19:19

解決方案1
1 2023-02-01 08:37:32

解決方案2
0 2023-02-02 02:19:19