简体   繁体   中英

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

My model uses pre-processed data to predict if a customer is a private or non-private customer. The pre-processing-step is using steps like feature_column.bucketized_column(…), feature_column.embedding_column(…) and so on. After the training, I am trying to save the model but I get the following error:

File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

I have tried the following to solve my problem:

Everything without success!

Here is the relevant code of the Model:

(feature_columns, train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.zip, args.batchSize)

feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)

model = tf.keras.models.Sequential([
        feature_layer,
        tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
    ])

model.compile(optimizer='sgd',
        loss='binary_crossentropy',
        metrics=['accuracy'])

paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)

...

model.fit(train_ds,
              validation_data=val_ds,
              epochs=args.epoch,
              callbacks=[tensorboard_callback])


model.summary()

loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)

paramString = paramString + "-a{:.4f}".format(accuracy)

outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramStrin

if args.saveModel:
       filepath = "./saved_models/" + outputName + ".h5"
       model.save(filepath, save_format='h5')

Called function in preprocessing Modul:

def getPreProcessedDatasets(filepath, zippath, batch_size, bucketSizeGEO):
    print("start preprocessing...")

    path = filepath
    data = pd.read_csv(path, dtype={
    "NAME1": np.str_, 
    "NAME2": np.str_, 
    "EMAIL1": np.str_, 
    "ZIP": np.str_, 
    "STREET": np.str_, 
    "LONGITUDE":np.floating, 
    "LATITUDE": np.floating, 
    "RECEIVERTYPE": np.int64}) 

    feature_columns = []

    data = data.fillna("NaN")

    data = __preProcessName(data)
    data = __preProcessStreet(data)
    
    train, test = train_test_split(data, test_size=0.2, random_state=0)
    train, val = train_test_split(train, test_size=0.2, random_state=0)

    train_ds = __df_to_dataset(train, batch_size=batch_size)
    val_ds = __df_to_dataset(val, shuffle=False, batch_size=batch_size)
    test_ds = __df_to_dataset(test, shuffle=False, batch_size=batch_size)


    __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, True)

    print("preprocessing completed")

    return (feature_columns, train_ds, val_ds, test_ds)

Calling the different preprocessing functions of the features:

def __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, addCrossedFeatures):
    
    feature_columns.append(__getFutureColumnLon(bucketSizeGEO))
    feature_columns.append(__getFutureColumnLat(bucketSizeGEO))
    
    (namew1_one_hot, namew2_one_hot) = __getFutureColumnsName(__getNumberOfWords(data, 'NAME1PRO'))
    feature_columns.append(namew1_one_hot)
    feature_columns.append(namew2_one_hot)
    
    feature_columns.append(__getFutureColumnStreet(__getNumberOfWords(data, 'STREETPRO')))
    
    feature_columns.append(__getFutureColumnZIP(2223, zippath))
    
    if addCrossedFeatures:
        feature_columns.append(__getFutureColumnCrossedNames(100))
        feature_columns.append(__getFutureColumnCrossedZIPStreet(100, 2223, zippath))

Function reletated to embeddings:

def __getFutureColumnsName(name_num_words):
    vocabulary_list = np.arange(0, name_num_words + 1, 1).tolist()

    namew1_voc = tf.feature_column.categorical_column_with_vocabulary_list(
        key='NAME1W1', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
    namew2_voc = tf.feature_column.categorical_column_with_vocabulary_list(
        key='NAME1W2', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)

    dim = __getNumberOfDimensions(name_num_words)

    namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)
    namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)

    return (namew1_embedding, namew2_embedding)
def __getFutureColumnStreet(street_num_words):
    vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()

    street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
        key='STREETW', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)

    dim = __getNumberOfDimensions(street_num_words)

    street_embedding = feature_column.embedding_column(street_voc, dimension=dim)

    return street_embedding
def __getFutureColumnZIP(zip_num_words, zippath):
    zip_voc = feature_column.categorical_column_with_vocabulary_file(
    key='ZIP', vocabulary_file=zippath, vocabulary_size=zip_num_words,
    default_value=0)

    dim = __getNumberOfDimensions(zip_num_words)

    zip_embedding = feature_column.embedding_column(zip_voc, dimension=dim)

    return zip_embedding

The error OSError: Unable to create link (name already exists) when saving model in h5 format is caused by some duplicate variable names. Checking by for i, w in enumerate(model.weights): print(i, w.name) showed that they are the embedding_weights names.

Normally, when building feature_column , the distinct key passed into each feature column will be used to build distinct variable name . This worked correctly in TF 2.1 but broke in TF 2.2 and 2.3, and supposedly fixed in TF 2.4 nigthly .

My workaround for TF 2.3 is based on @SajanGohil's comment, but my issue was with weight names (not layer names):

for i in range(len(model.weights)):
    model.weights[i]._handle_name = model.weights[i].name + "_" + str(i)

The same caveats apply: this approach manipulates TF internals and thus is not future-proof.

I found out that this situation also occurs when I load a model from a modelcheckpoint, model.compile it with same optimizer, metrics and loss function, and train it. But if I avoid compiling it again with same parameters, this error message would not show up again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM