繁体   English   中英

Tensorflow GPU 调用 model.evaluate() 时出现资源耗尽错误,适用于 Z20F35E630DAF44DB8CZFfit()。

[英]Tensorflow GPU Resource exhausted error when calling model.evaluate() works well for model.fit()

我在 tensorflow GPU 上的 X 射线图像上运行 Mobilenet model。 我能够安装 model 而没有任何错误(使用批量大小=1)。 但是,当我尝试调用 model.evaluate 时,它给了我“资源耗尽错误”

这是具有输入形状 (224,224,3) 的 model

from tensorflow.keras.applications.mobilenet import MobileNet
from tensorflow.keras.layers import Concatenate, UpSampling2D, Conv2D, Reshape
from tensorflow.keras.models import Model

def create_model(trainable=True):
    
    model = MobileNet(input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3), include_top=False, alpha=ALPHA, weights='imagenet')
    
    for layer in model.layers:
        layer.trainable = trainable
        
    block1 = model.get_layer("conv_pw_1_relu").output
    block2 = model.get_layer("conv_pw_3_relu").output
    block3 = model.get_layer("conv_pw_5_relu").output
    block4 = model.get_layer("conv_pw_11_relu").output
    block5 = model.get_layer("conv_pw_13_relu").output
    
    x = Concatenate()([UpSampling2D()(block5), block4])
    x = Concatenate()([UpSampling2D()(x), block3])
    x = Concatenate()([UpSampling2D()(x), block2])
    x = Concatenate()([UpSampling2D()(x), block1])
    x = UpSampling2D()(x)
    
    x = Conv2D(1, kernel_size=1, activation='sigmoid')(x)
    x = Reshape((IMAGE_HEIGHT, IMAGE_WIDTH))(x)
    
    return Model(inputs=model.input, outputs=x)

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
  model = create_model()
  model.summary()

  optimizer = Adam(lr = 0.001)
  model.compile(loss=loss, optimizer=optimizer, metrics=[dice_coefficient])

  checkpoint = ModelCheckpoint("model-{loss:.2f}.h5", monitor="loss", verbose=1, save_best_only=True,
                             save_weights_only=True, mode="min", period=1)
  stop = EarlyStopping(monitor="loss", patience=5, mode="min")
  reduce_lr = ReduceLROnPlateau(monitor="loss", factor=0.2, patience=5, min_lr=1e-6, verbose=1, mode="min")

history=model.fit(X_train, y_train, validation_data=(X_val, y_val),
          epochs=EPOCHS,
          batch_size = BATCH_SIZE,
          callbacks = [checkpoint, stop, reduce_lr],
          verbose=1)
model.evaluate(X_val, y_val, verbose=1)

这是我运行 model.evaluate() 时的错误

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-26-3301985d3ba5> in <module>()
----> 1 model.evaluate(X_val, y_val, verbose=1)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[32,224,224,1984] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node model/up_sampling2d_4/resize/ResizeNearestNeighbor (defined at /lib/python3.6/threading.py:916) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[div_no_nan/ReadVariableOp_1/_22]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[32,224,224,1984] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node model/up_sampling2d_4/resize/ResizeNearestNeighbor (defined at /lib/python3.6/threading.py:916) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_34461]

Function call stack:
test_function -> test_function

model.evaluate()也将batch_size作为参数,因此您应该再次使用它:

batch_size = BATCH_SIZE

否则,您将一次传递整个数据集

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM