简体   繁体   English

输出层中不兼容的形状 - Tensorflow

[英]Incompatible shapes in output layer - Tensorflow

I am trying to build a bi-LSTM model in tensorflow, environment google colab.我正在尝试在 tensorflow 环境 google colab 中构建双 LSTM 模型。 In the training process, the model have an issue: the last layer says that there is shape incompatibility.在训练过程中,模型有一个问题:最后一层说存在形状不兼容。 I wonder if there is any way to reshape the x_train and y_train, to fix this problem我想知道是否有任何方法可以重塑 x_train 和 y_train,以解决此问题

Traceback追溯

ValueError: Shapes (16, 11) and (16, 10) are incompatible

If I change the value of the neurons units to my output layer, from 11 to 10, it does not give any error and the model can be trained.如果我将神经元单元的值更改为我的输出层,从 11 到 10,它不会给出任何错误并且可以训练模型。 However, I want the output to be 10 and not 11.但是,我希望输出为 10 而不是 11。

# current output layer (run perfectly)
tf.keras.layers.Dense (11, activation = 'softmax')

# expected output layer (shape incompatibility)
tf.keras.layers.Dense (10, activation = 'softmax')

BiLSTM Model BiLSTM 模型

def build_model(vocab_size, embedding_dim=64, input_length=30):
    print('\nbuilding the model...\n')

    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(input_dim=(vocab_size + 1), output_dim=embedding_dim, input_length=input_length),
        
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units,return_sequences=True, dropout=0.2)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units,return_sequences=True, dropout=0.2)),
        tf.keras.layers.GlobalMaxPool1D(),
        tf.keras.layers.Dropout(0.1),
        tf.keras.layers.Dense(64, activation='tanh'),
        
        # softmax output layer
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    # optimizer & loss
    opt = 'RMSprop' #tf.optimizers.Adam(learning_rate=1e-4)
    loss = 'categorical_crossentropy'

    # Metrics
    metrics = ['accuracy', 'AUC','Precision', 'Recall']

    # compile model
    model.compile(optimizer=opt,
                  loss=loss,
                  metrics=metrics)
    
    model.summary()

    return model

The BATCH_SIZE is set to 16. And the shapes of y_train and x_train are: BATCH_SIZE 设置为 16。 y_train 和 x_train 的形状为:

x_train.shape
(800, 30)

y_train.shape
(800,)

Training训练

def train(model, x_train, y_train, x_validation, y_validation,
          epochs, batch_size=32, patience=5, 
          verbose=2, monitor_es='accuracy', mode_es='auto', restore=True,
          monitor_mc='val_accuracy', mode_mc='max'):
    
    print('\ntraining...\n')

    # callback
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor=monitor_es,
                                                      verbose=1, mode=mode_es, restore_best_weights=restore,
                                                      min_delta=1e-3, patience=patience)
    
    model_checkpoint = tf.keras.callbacks.ModelCheckpoint('tfjsmode.h5', monitor=monitor_mc, mode=mode_mc,      
                                                          verbose=1, save_best_only=True)

    # Define Tensorboard as a Keras callback
    tensorboard = TensorBoard(
        log_dir='./logs',
        histogram_freq=1,
        write_images=True
    )

    keras_callbacks = [tensorboard, early_stopping, model_checkpoint]

    # train model
    history = model.fit(x_train, y_train,
                        batch_size=batch_size, epochs=epochs, verbose=verbose,
                        validation_data=(x_validation, y_validation),
                        callbacks=keras_callbacks)
    return history

Preprocessing预处理

def preprocess(x, padding_shape=30):
    return np.array([ord(i.lower()) - ord('a')+1 if not i.isdigit() and i != ' ' else 0 for i in list(x)] + ([0] * (padding_shape - len(x))), dtype=int)

def prepare_dataset(labeldict : dict, test_size=.3, validation_size=.1): 
    print('preparing the dataset...\n')
    
    from sklearn import preprocessing

    # load dataset
    # split dataset (as string into panda.core.series.Serie object)
    x, y = load_clean_dataset()

    x = np.array(list(map(preprocess, x)))
    y = np.array(list(map(lambda x: labeldict[x.replace(' ', '_')], y)))
    print(('y: {}').format(y))

    # create/split train, validation and test and shuffle the data
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_size, shuffle=True)
    print(x.max(), x.min())

    x_train_val, x_validation, y_train_val, y_validation = train_test_split(x_train, y_train, test_size=test_size, shuffle=True)

    # pandas.core.series.Series to numpy array
    x_train, y_train = np.array(x_train), np.array(y_train)
    x_validation, y_validation =  np.array(x_validation), np.array(y_validation)
    x_test, y_test = np.array(x_test), np.array(y_test)
    
    x_train_val, y_train_val = np.array(x_train_val), np.array(y_train_val)

    print(('\nx_train: \n{}\n\ny_train: \n{}').format(x_train_val, y_train_val))
    y_train = tf.keras.utils.to_categorical(y, num_classes=10)
    return (x_train, y_train), (x_validation, y_validation), (x_test, y_test), (x_train_val, y_train_val)

It seems you currently have labels as integers (ie not one-hot encoded vectors).看来您目前将标签作为整数(即​​不是单热编码向量)。 For example your y seems to be like,例如你的y似乎是这样的,

[0, 1, 8, 9, ....] # a vector of 800 elements

There's two ways to train a model on such data.有两种方法可以在此类数据上训练模型。

Alternative 1 (easiest I guess)备选方案 1(我猜是最简单的)

Use sparse_categorical_crossentropy as the loss function of the model使用sparse_categorical_crossentropy作为模型的损失函数

model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=metrics)

Alternative 2备选方案 2

Convert your labels to one-hot encoded using,使用以下方法将您的标签转换为单热编码,

y_onehot = tf.keras.utils.to_categorical(y, num_classes=10)

and then keep the loss of the model as categorical_crossentropy然后将模型的损失保持为categorical_crossentropy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM