输出层中不兼容的形状 - Tensorflow

Question

我正在尝试在 tensorflow 环境 google colab 中构建双 LSTM 模型。 在训练过程中，模型有一个问题：最后一层说存在形状不兼容。 我想知道是否有任何方法可以重塑 x_train 和 y_train，以解决此问题

追溯

ValueError: Shapes (16, 11) and (16, 10) are incompatible

如果我将神经元单元的值更改为我的输出层，从 11 到 10，它不会给出任何错误并且可以训练模型。 但是，我希望输出为 10 而不是 11。

# current output layer (run perfectly)
tf.keras.layers.Dense (11, activation = 'softmax')

# expected output layer (shape incompatibility)
tf.keras.layers.Dense (10, activation = 'softmax')

BiLSTM 模型

def build_model(vocab_size, embedding_dim=64, input_length=30):
    print('\nbuilding the model...\n')

    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(input_dim=(vocab_size + 1), output_dim=embedding_dim, input_length=input_length),
        
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units,return_sequences=True, dropout=0.2)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units,return_sequences=True, dropout=0.2)),
        tf.keras.layers.GlobalMaxPool1D(),
        tf.keras.layers.Dropout(0.1),
        tf.keras.layers.Dense(64, activation='tanh'),
        
        # softmax output layer
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    # optimizer & loss
    opt = 'RMSprop' #tf.optimizers.Adam(learning_rate=1e-4)
    loss = 'categorical_crossentropy'

    # Metrics
    metrics = ['accuracy', 'AUC','Precision', 'Recall']

    # compile model
    model.compile(optimizer=opt,
                  loss=loss,
                  metrics=metrics)
    
    model.summary()

    return model

BATCH_SIZE 设置为 16。 y_train 和 x_train 的形状为：

x_train.shape
(800, 30)

y_train.shape
(800,)

训练

def train(model, x_train, y_train, x_validation, y_validation,
          epochs, batch_size=32, patience=5, 
          verbose=2, monitor_es='accuracy', mode_es='auto', restore=True,
          monitor_mc='val_accuracy', mode_mc='max'):
    
    print('\ntraining...\n')

    # callback
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor=monitor_es,
                                                      verbose=1, mode=mode_es, restore_best_weights=restore,
                                                      min_delta=1e-3, patience=patience)
    
    model_checkpoint = tf.keras.callbacks.ModelCheckpoint('tfjsmode.h5', monitor=monitor_mc, mode=mode_mc,      
                                                          verbose=1, save_best_only=True)

    # Define Tensorboard as a Keras callback
    tensorboard = TensorBoard(
        log_dir='./logs',
        histogram_freq=1,
        write_images=True
    )

    keras_callbacks = [tensorboard, early_stopping, model_checkpoint]

    # train model
    history = model.fit(x_train, y_train,
                        batch_size=batch_size, epochs=epochs, verbose=verbose,
                        validation_data=(x_validation, y_validation),
                        callbacks=keras_callbacks)
    return history

预处理

def preprocess(x, padding_shape=30):
    return np.array([ord(i.lower()) - ord('a')+1 if not i.isdigit() and i != ' ' else 0 for i in list(x)] + ([0] * (padding_shape - len(x))), dtype=int)

def prepare_dataset(labeldict : dict, test_size=.3, validation_size=.1): 
    print('preparing the dataset...\n')
    
    from sklearn import preprocessing

    # load dataset
    # split dataset (as string into panda.core.series.Serie object)
    x, y = load_clean_dataset()

    x = np.array(list(map(preprocess, x)))
    y = np.array(list(map(lambda x: labeldict[x.replace(' ', '_')], y)))
    print(('y: {}').format(y))

    # create/split train, validation and test and shuffle the data
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_size, shuffle=True)
    print(x.max(), x.min())

    x_train_val, x_validation, y_train_val, y_validation = train_test_split(x_train, y_train, test_size=test_size, shuffle=True)

    # pandas.core.series.Series to numpy array
    x_train, y_train = np.array(x_train), np.array(y_train)
    x_validation, y_validation =  np.array(x_validation), np.array(y_validation)
    x_test, y_test = np.array(x_test), np.array(y_test)
    
    x_train_val, y_train_val = np.array(x_train_val), np.array(y_train_val)

    print(('\nx_train: \n{}\n\ny_train: \n{}').format(x_train_val, y_train_val))
    y_train = tf.keras.utils.to_categorical(y, num_classes=10)
    return (x_train, y_train), (x_validation, y_validation), (x_test, y_test), (x_train_val, y_train_val)

Answer 1

看来您目前将标签作为整数（即不是单热编码向量）。 例如你的y似乎是这样的，

[0, 1, 8, 9, ....] # a vector of 800 elements

有两种方法可以在此类数据上训练模型。

备选方案 1（我猜是最简单的）

使用sparse_categorical_crossentropy作为模型的损失函数

model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=metrics)

备选方案 2

使用以下方法将您的标签转换为单热编码，

y_onehot = tf.keras.utils.to_categorical(y, num_classes=10)

然后将模型的损失保持为categorical_crossentropy

输出层中不兼容的形状 - Tensorflow

问题描述

追溯

BiLSTM 模型

训练

预处理

1 个解决方案

解决方案1
2 2020-10-12 00:37:55

备选方案 1（我猜是最简单的）

备选方案 2

输出层中不兼容的形状 - Tensorflow

问题描述

追溯

BiLSTM 模型

训练

预处理

1 个解决方案

解决方案1 2 2020-10-12 00:37:55

备选方案 1（我猜是最简单的）

备选方案 2

解决方案1
2 2020-10-12 00:37:55