TensorFlow 2.x：使用嵌入列时无法加载 h5 格式的训练模型（ValueError: Shapes (101, 15) and (57218, 15) is incompatible）

Question

After long back and forth, I managed to save my model (see my question TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists)) ).经过长时间的来回，我设法保存了我的模型（请参阅我的问题TensorFlow 2.x：无法以 h5 格式保存训练模型（OSError：无法创建链接（名称已存在）））。 But now I have problems loading the saved model.但是现在我在加载保存的模型时遇到了问题。 First I got the following error by loading a model:首先，我通过加载模型得到以下错误：

ValueError: You are trying to load a weight file containing 1 layers into a model with 0 layers.

After changing the sequential to the functional API I get the following error:将顺序更改为功能 API 后，我收到以下错误：

ValueError: Cannot assign to variable dense_features/NAME1W1_embedding/embedding_weights:0 due to variable shape (101, 15) and value shape (57218, 15) are incompatible

I tried different versions of TensorFlow.我尝试了不同版本的 TensorFlow。 I got the described error in Version tf-nightly.我在 tf-nightly 版本中收到了描述的错误。 In Version 2.1 I got a quite similar error:在 2.1 版中，我得到了一个非常相似的错误：

ValueError: Shapes (101, 15) and (57218, 15) are incompatible.

In version 2.2 and 2.3 I can't even save my model (as described in my previous question).在 2.2 和 2.3 版本中，我什至无法保存我的模型（如我之前的问题所述）。

Here is the relevant code of the functional API:下面是函数式API的相关代码：

def __loadModel(args):
    filepath = args.loadModel

    model = tf.keras.models.load_model(filepath)

    print("start preprocessing...")
    (_, _, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.batchSize)
    print("preprocessing completed")

    _, accuracy = model.evaluate(test_ds)
    print("Accuracy", accuracy)



def __trainModel(args):
    (train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.batchSize)

    for bucketSizeGEO in args.bucketSizeGEO:
        print("start preprocessing...")
        feature_columns = preprocessing.getFutureColumns(args.data, args.zip, bucketSizeGEO, True)
        #Todo: compare trainable=False to trainable=True
        feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)
        print("preprocessing completed")


        feature_layer_inputs = preprocessing.getFeatureLayerInputs()
        feature_layer_outputs = feature_layer(feature_layer_inputs)
        output_layer = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(feature_layer_outputs)

        model = tf.keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=output_layer)

        model.compile(optimizer='sgd',
            loss='binary_crossentropy',
            metrics=['accuracy'])

        paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)


        log_dir = "logs\\logR\\" + paramString + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
        tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)


        model.fit(train_ds,
                validation_data=val_ds,
                epochs=args.epoch,
                callbacks=[tensorboard_callback])


        model.summary()

        loss, accuracy = model.evaluate(test_ds)
        print("Accuracy", accuracy)

        paramString = paramString + "-a{:.4f}".format(accuracy)

        outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramString

        

        if args.saveModel:
            for i, w in enumerate(model.weights): print(i, w.name)

            path = './saved_models/' + outputName + '.h5'
            model.save(path, save_format='h5')

For the relevant preprocessing part see the mentioned question at the beginning of this question.对于相关的预处理部分，请参阅本问题开头提到的问题。 for i, w in enumerate(model.weights): print(i, w.name) returns the following: for i, w in enumerate(model.weights): print(i, w.name)返回以下内容：

0 dense_features/NAME1W1_embedding/embedding_weights:0
1 dense_features/NAME1W2_embedding/embedding_weights:0
2 dense_features/STREETW_embedding/embedding_weights:0
3 dense_features/ZIP_embedding/embedding_weights:0
4 dense/kernel:0
5 dense/bias:0

Answer 1

This problem is caused by the inconsistency between the dimension of emebedding matrix in training and prediction.这个问题是由于训练和预测中嵌入矩阵的维度不一致造成的。

Usually, before we use the embedded matrix, we will form a dictionary.通常，在我们使用嵌入矩阵之前，我们会形成一个字典。 Here we temporarily call this dictionary word_index。 If the author of the code is not thoughtful, it will lead to two different words_index in training and prediction (because the data used in training and prediction are different), the dimension of emebedding matrix changes.这里暂时把这个字典叫做word_index。如果代码作者不细心，会导致训练和预测两个word_index不同（因为训练和预测使用的数据不同），embedding matrix的维数发生变化。

You can see from your bug that you get len (word_index) + 1 when you train is 57218, and len (word_index) + 1 is obtained during prediction is 101.从你的bug中可以看到，训练时得到len(word_index)+1是57218，预测时得到len(word_index)+1是101。

If we want to run the code correctly, we can't regenerate a word_index during prediction when we need to use the prediction of word_index.如果我们想正确运行代码，当需要使用word_index的预测时，我们不能在预测的时候重新生成一个word_index。 So the simplest solution to this problem is to save the word_index you get when you train, which is called at the time of prediction, so that we can correctly load the weight we get during training.所以解决这个问题最简单的办法就是保存你训练时得到的word_index，在预测的时候调用，这样我们就可以正确加载训练时得到的权重。

Answer 2

I was able to solve my rather stupid mistake:我能够解决我相当愚蠢的错误：

I was using the feature_column library to preprocess my data.我正在使用 feature_column 库来预处理我的数据。 Unfortunately, I specified a fixed and not the actual size of the vocabulary list in the parameter num_buckets in the function categorical_column_with_identity.不幸的是，我在函数 categorical_column_with_identity 的参数 num_buckets 中指定了词汇表的固定大小而不是实际大小。 Wrong version:错误版本：

street_voc = tf.feature_column.categorical_column_with_identity(
        key='STREETW', num_buckets=100)

Correct version:正确版本：

street_voc = tf.feature_column.categorical_column_with_identity(
        key='STREETW', num_buckets= __getNumberOfWords(data, 'STREETPRO') + 1)

The function __getNumberOfWords(data, 'STREETPRO') returns the number of different words in the column 'STREETPRO' of the pandas dataframe.函数__getNumberOfWords(data, 'STREETPRO')返回 pandas 数据帧的'STREETPRO'列中不同单词的数量。

TensorFlow 2.x：使用嵌入列时无法加载 h5 格式的训练模型（ValueError: Shapes (101, 15) and (57218, 15) is incompatible）

问题描述

2 个解决方案

解决方案1
1 2020-09-29 06:32:39

解决方案2
0 已采纳 2020-12-17 20:27:44

TensorFlow 2.x：使用嵌入列时无法加载 h5 格式的训练模型（ValueError: Shapes (101, 15) and (57218, 15) is incompatible）

问题描述

2 个解决方案

解决方案1 1 2020-09-29 06:32:39

解决方案2 0 已采纳 2020-12-17 20:27:44

解决方案1
1 2020-09-29 06:32:39

解决方案2
0 已采纳 2020-12-17 20:27:44