简体   繁体   English

tf.keras model.predict 导致内存泄漏

[英]tf.keras model.predict results in memory leak

Working on google colab.在谷歌 colab 上工作。 Using tf.keras and tensorflow version 2.3.0 I'm getting crazy because I can't use the model I've trained to run predictions with model.predict because it runs out of CPU RAM.使用tf.keras和 tensorflow 2.3.0 版我快疯了,因为我无法使用我训练过的模型来使用model.predict运行预测,因为它耗尽了 CPU RAM。 I've been able to reproduce the issue with a very minimal example.我已经能够用一个非常简单的例子重现这个问题。

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

inputL = Input([matrixSide,matrixSide,12]) #create a toy model
l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
l1 = Conv2D(1,1,padding='same')(l1)
l1 = Activation('linear')(l1)
model = Model(inputs= inputL,outputs = l1)


#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range (60):
  print(i)
  outImm = model.predict(inImm)
# K.clear_session() #somebody suggested it...

Basically, when working on GPU, it uses 3.0 GB of CPU RAM in the first 4 iterations,then it goes up to 7, then to 10 then it crashes because it exhausted all the available RAM!基本上,在 GPU 上工作时,它在前 4 次迭代中使用 3.0 GB 的 CPU RAM,然后它上升到 7,然后到 10,然后它崩溃了,因为它耗尽了所有可用的 RAM! When running on CPU it lasts for more iterations, sometimes it even decreases the amount of RAM it's using from 9 GB back to 3 GB but in the end it still crashes after 20 or so iterations.在 CPU 上运行时,它会持续进行更多迭代,有时甚至会将其使用的 RAM 量从 9 GB 减少到 3 GB,但最终在 20 次左右的迭代后仍会崩溃。

This previous example ( Keras predict loop memory leak using tf.data.Dataset but not with a numpy array ) had similar issues when using tf.data but not with numpy.前面的例子( Keras predict loop memory leak using tf.data.Dataset but not with a numpy array )在使用tf.data但不使用 numpy 时有类似的问题。 Somebody suggested on github issues for tensorflow 1.14 to do a K.clear_session in each loop... but it doesn't help!有人在 github 问题上建议 tensorflow 1.14 在每个循环中执行K.clear_session ......但这无济于事!

Any idea on how to fix this?关于如何解决这个问题的任何想法?

This is my understanding after posting this as a bug to Tensorflow.这是我将其作为错误发布到 Tensorflow 后的理解。

Changing the code to;将代码更改为;

in_imm = np.zeros((64,matrix_side,matrix_side,12))
for i in range (60):
  print(i)
  tensor = tf.convert_to_tensor(in_imm, dtype=tf.float32)
  out_imm = model.predict(tensor)

Using tf.keras.Model.predict in a for loop with a numpy input creates a new graph every iteration because the numpy array is created with a different signature.在带有 numpy 输入的 for 循环中使用 tf.keras.Model.predict 每次迭代都会创建一个新图,因为 numpy 数组是使用不同的签名创建的。 Converting the numpy array to a tensor maintains the same signature and avoids creating new graphs.将 numpy 数组转换为张量保持相同的签名并避免创建新图。

I've found a fix for the memory leak.我找到了解决内存泄漏的方法。 While K.clear_session() doesn't do anything in my case, adding a garbage collection after each call with _ = gc.collect() actually does the trick!虽然在我的情况下K.clear_session()没有做任何事情,但在每次使用_ = gc.collect()调用之后添加垃圾收集实际上可以解决问题! The memory used actually is constant now and I can run as many prediction as I want.现在实际使用的内存是恒定的,我可以运行任意数量的预测。

I solved the problem by using K.clear_session() .我通过使用K.clear_session()解决了这个问题。 First of all you need to define a session before one can clear it.首先,您需要先定义一个会话,然后才能清除它。 The purpose for this is explained in both of these, here and here . 此处此处均解释了这样做的目的。

config= tf.ConfigProto(log_device_placement=True) 
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

At first, using K.clear_session() in the loop results in an error after the first prediction.起初,在循环中使用K.clear_session()会导致第一次预测后出现错误。 In my opinion, tf loses the connection to the model .在我看来, tf 失去了与model的联系。 For this reason, I create a new model within every run of the loop.出于这个原因,我在循环的每次运行中创建一个新模型。 This negativly effects the code's speed for the first multiple runs, however an accumulation of RAM storage is prevented.这会对第一次多次运行的代码速度产生负面影响,但是会阻止 RAM 存储的累积。

The following code contains the suggested improvements:以下代码包含建议的改进:

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

def create_model(matrixSide_v):
    inputL = Input([matrixSide_v,matrixSide_v,12]) #create a toy model
    l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
    l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
    l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
    l1 = Conv2D(1,1,padding='same')(l1)
    l1 = Activation('linear')(l1)
    c_model = Model(inputs= inputL,outputs = l1)
    return c_model

#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range(64):
    print(i)
    model = create_model(matrixSide)
    outImm = model.predict(inImm)
    K.clear_session()

我在网上尝试了很多方法,它们都不起作用,但我最终通过使用 tensorflow 1.13 而不是 2.x 解决了这个问题,旧版本确实有帮助。

I'am using simple solution based on keras docs我正在使用基于keras 文档的简单解决方案

For small amount of inputs that fit in one batch, directly using call () is recommended for faster execution, eg, model(x), or model(x, training=False)对于一批适合的少量输入,建议直接使用call () 以加快执行速度,例如 model(x) 或 model(x, training=False)

for filename in image_filenames:
  # read of data
  input = load_image(filename)

  # prediction
  output = model(input) # executes __call__() or call()

Using of __call__() or model(input) avoids memory leaks inside predict method which creates a data generator with one data item each time of execution and doesn't release memory.使用__call__()model(input)避免了predict方法内部的内存泄漏,该方法每次执行都会创建一个具有一个数据项的数据生成器,并且不会释放内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM