Keras' clear_session() not working in Google colab

Question

I run a keras model for several times in Google colab. Due to the nature of tensorflow there is a new model created each time of the program run, which leads to exhausted memory after some runs. I found that clear_session() of keras should help at the problem, but it doesn't seem to work. I created an MWE for Google colab below.

import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

X = np.zeros([10, 10000])
y = np.zeros([10, 10000])

########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()

m.fit(X,y)
K.clear_session()

After running the part below ######## for three times, I get the following error:

---------------------------------------------------------------------------

ResourceExhaustedError                    Traceback (most recent call last)

<ipython-input-3-7ae5ab890fc2> in <module>
      3 m.summary()
      4 
----> 5 m.fit(X,y)
      6 K.clear_session()

1 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53     ctx.ensure_initialized()
     54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:
     57     if name is not None:

ResourceExhaustedError: Graph execution error:

Detected at node 'RMSprop/RMSprop/update_2/mul_2' defined at (most recent call last):
    File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
      "__main__", mod_spec)
    File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
      exec(code, run_globals)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
      app.start()
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start
      self.io_loop.start()
    File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
      self.asyncio_loop.run_forever()
    File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
      self._run_once()
    File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
      handle._run()
    File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
      self._context.run(self._callback, *self._args)
    File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
      ret = callback()
    File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner
      self.run()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
      yielded = self.gen.send(value)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 381, in dispatch_queue
      yield self.process_one()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 346, in wrapper
      runner = Runner(result, future, yielded)
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1080, in __init__
      self.run()
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
      yielded = self.gen.send(value)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request
      user_expressions, allow_stdin,
    File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
      yielded = next(result)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
      raw_cell, store_history, silent, shell_futures)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
      return runner(coro)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
      coro.send(None)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
      interactivity=interactivity, compiler=compiler, result=result)
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
      if (await self.run_code(code, result,  async_=asy)):
    File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-3-7ae5ab890fc2>", line 5, in <module>
      m.fit(X,y)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 893, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
      return self.apply_gradients(grads_and_vars, name=name)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 682, in apply_gradients
      name=name)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 724, in _distributed_apply
      var, apply_grad_to_update_var, args=(grad,), group=False)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var
      update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/rmsprop.py", line 216, in _resource_apply_dense
      var_t = var - coefficients["lr_t"] * grad / (
Node: 'RMSprop/RMSprop/update_2/mul_2'
failed to allocate memory
     [[{{node RMSprop/RMSprop/update_2/mul_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_2465]

I want to play around with slightly different data on the same model, so I run a similar part several times. I can simply restart the notebook after the error, but it takes some time to load the data, so is there an option how I can really clear an old model? Thanks for help.

Answer 1

Please restart the runtime and try again as I tried replicating the above code and it's working fine.

You can check the output mentioned below for the same code:

import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

X = np.zeros([10, 10000])
y = np.zeros([10, 10000])

########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()

m.fit(X,y, epochs=2)
K.clear_session()

Output:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10000)             100010000 
                                                                 
 dense_1 (Dense)             (None, 10000)             100010000 
                                                                 
 dense_2 (Dense)             (None, 10000)             100010000 
                                                                 
 dense_3 (Dense)             (None, 10000)             100010000 
                                                                 
=================================================================
Total params: 400,040,000
Trainable params: 400,040,000
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
1/1 [==============================] - 10s 10s/step - loss: 0.0000e+00
Epoch 2/2
1/1 [==============================] - 6s 6s/step - loss: 0.0000e+00

Keras' clear_session() not working in Google colab

Question

1 answers

solution1
0 2022-11-24 13:49:31

Keras' clear_session() not working in Google colab

Question

1 answers

solution1 0 2022-11-24 13:49:31

solution1
0 2022-11-24 13:49:31