tf.keras model.predict 導致內存泄漏

Question

在谷歌 colab 上工作。 使用tf.keras和 tensorflow 2.3.0 版我快瘋了，因為我無法使用我訓練過的模型來使用model.predict運行預測，因為它耗盡了 CPU RAM。 我已經能夠用一個非常簡單的例子重現這個問題。

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

inputL = Input([matrixSide,matrixSide,12]) #create a toy model
l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
l1 = Conv2D(1,1,padding='same')(l1)
l1 = Activation('linear')(l1)
model = Model(inputs= inputL,outputs = l1)


#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range (60):
  print(i)
  outImm = model.predict(inImm)
# K.clear_session() #somebody suggested it...

基本上，在 GPU 上工作時，它在前 4 次迭代中使用 3.0 GB 的 CPU RAM，然后它上升到 7，然后到 10，然后它崩潰了，因為它耗盡了所有可用的 RAM！ 在 CPU 上運行時，它會持續進行更多迭代，有時甚至會將其使用的 RAM 量從 9 GB 減少到 3 GB，但最終在 20 次左右的迭代后仍會崩潰。

前面的例子（ Keras predict loop memory leak using tf.data.Dataset but not with a numpy array ）在使用tf.data但不使用 numpy 時有類似的問題。 有人在 github 問題上建議 tensorflow 1.14 在每個循環中執行K.clear_session ......但這無濟於事！

關於如何解決這個問題的任何想法？

Answer 1

這是我將其作為錯誤發布到 Tensorflow 后的理解。

將代碼更改為；

in_imm = np.zeros((64,matrix_side,matrix_side,12))
for i in range (60):
  print(i)
  tensor = tf.convert_to_tensor(in_imm, dtype=tf.float32)
  out_imm = model.predict(tensor)

在帶有 numpy 輸入的 for 循環中使用 tf.keras.Model.predict 每次迭代都會創建一個新圖，因為 numpy 數組是使用不同的簽名創建的。 將 numpy 數組轉換為張量保持相同的簽名並避免創建新圖。

Answer 2

我找到了解決內存泄漏的方法。 雖然在我的情況下K.clear_session()沒有做任何事情，但在每次使用_ = gc.collect()調用之后添加垃圾收集實際上可以解決問題！ 現在實際使用的內存是恆定的，我可以運行任意數量的預測。

Answer 3

我通過使用K.clear_session()解決了這個問題。 首先，您需要先定義一個會話，然后才能清除它。 此處和此處均解釋了這樣做的目的。

config= tf.ConfigProto(log_device_placement=True) 
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

起初，在循環中使用K.clear_session()會導致第一次預測后出現錯誤。 在我看來， tf 失去了與model的聯系。 出於這個原因，我在循環的每次運行中創建一個新模型。 這會對第一次多次運行的代碼速度產生負面影響，但是會阻止 RAM 存儲的累積。

以下代碼包含建議的改進：

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

def create_model(matrixSide_v):
    inputL = Input([matrixSide_v,matrixSide_v,12]) #create a toy model
    l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
    l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
    l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
    l1 = Conv2D(1,1,padding='same')(l1)
    l1 = Activation('linear')(l1)
    c_model = Model(inputs= inputL,outputs = l1)
    return c_model

#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range(64):
    print(i)
    model = create_model(matrixSide)
    outImm = model.predict(inImm)
    K.clear_session()

Answer 4

我在網上嘗試了很多方法，它們都不起作用，但我最終通過使用 tensorflow 1.13 而不是 2.x 解決了這個問題，舊版本確實有幫助。

Answer 5

我正在使用基於keras 文檔的簡單解決方案

對於一批適合的少量輸入，建議直接使用call () 以加快執行速度，例如 model(x) 或 model(x, training=False)

for filename in image_filenames:
  # read of data
  input = load_image(filename)

  # prediction
  output = model(input) # executes __call__() or call()

使用__call__()或model(input)避免了predict方法內部的內存泄漏，該方法每次執行都會創建一個具有一個數據項的數據生成器，並且不會釋放內存。

tf.keras model.predict 導致內存泄漏

問題描述

5 個解決方案

解決方案1
6 2020-11-10 08:07:15

解決方案2
4 2020-10-05 14:01:42

解決方案3
2 2021-04-09 15:44:16

解決方案4
1 2022-07-06 14:32:19

解決方案5
0 2021-11-14 06:00:17

tf.keras model.predict 導致內存泄漏

問題描述

5 個解決方案

解決方案1 6 2020-11-10 08:07:15

解決方案2 4 2020-10-05 14:01:42

解決方案3 2 2021-04-09 15:44:16

解決方案4 1 2022-07-06 14:32:19

解決方案5 0 2021-11-14 06:00:17

解決方案1
6 2020-11-10 08:07:15

解決方案2
4 2020-10-05 14:01:42

解決方案3
2 2021-04-09 15:44:16

解決方案4
1 2022-07-06 14:32:19

解決方案5
0 2021-11-14 06:00:17