简体   繁体   English

在 keras 调谐器中得到了 nan 但它在我训练时有效

[英]got nan in keras tuner but it works when I train it

I trained my.network several times and I already got some results.我对 my.network 进行了多次培训,并且已经取得了一些成果。 Then I found out about the Keras tuner and wanted to find the best hyperparameters with it.然后我发现了 Keras 调谐器,想用它找到最好的超参数。 but the loss in tuner always becomes nan ( it won't get nan if you train it regularly).但是 tuner 的损失总是变成 nan(如果你定期训练它就不会得到 nan)。 I'm using MobileNetv3Small as the backbone and wanted to find optimal layers numbers and units.我使用 MobileNetv3Small 作为主干,并希望找到最佳的层数和单元。 here is my model build:这是我的 model 版本:

def build_model(hp):
model = keras.Sequential()
model.add(base)
# Tune the number of layers.
if hp.Boolean('globalMax'):
  model.add(layers.GlobalMaxPool2D())
model.add(layers.Flatten())
for i in range(hp.Int("num_layers", 1, 3)):
    model.add(
        layers.Dense(
            # Tune number of units separately.
            units=hp.Int(f"units_{i}", min_value=3, max_value=12, step=1),
        )
    )
if hp.Boolean("dropout"):
    model.add(layers.Dropout(rate=0.1))
model.add(layers.Dense(3))
model.compile(loss=mae, optimizer='sgd',metrics=[mae])
return model

and I'm using我正在使用

 `tuner = kt.RandomSearch(
    hypermodel=build_model,
    objective="val_loss",
    executions_per_trial=2,
    overwrite=True
)`

and this is the output: Best val_loss So Far: nan Total elapsed time: 00h 02m 28s INFO:tensorflow:Oracle triggered exit这是 output:迄今为止最好的 val_loss:nan 总耗时:00h 02m 28s INFO:tensorflow:Oracle 触发退出

what is the problem?问题是什么? I already checked any other optimizer ( however it works with.fit perfectly), tried removing dropout and even normalization我已经检查了任何其他优化器(但它与 .fit 完美配合),尝试删除丢失甚至规范化

So I finally found the problem.所以我终于找到了问题所在。 It happened because keras_tuner is just trying to find some validation with a small batch and in my situation, it will be nan because the number is nearly infinite.发生这种情况是因为keras_tuner只是试图用小批量找到一些验证,在我的情况下,它将是 nan,因为数量几乎是无限的。 after trying a bigger batch, and changing the loss function, it could get out of being Nan all the time and found some results.在尝试更大的批次并更改损失 function 之后,它可以摆脱一直是 Nan 并找到一些结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM