I run a keras model for several times in Google colab. Due to the nature of tensorflow there is a new model created each time of the program run, which leads to exhausted memory after some runs. I found that clear_session()
of keras should help at the problem, but it doesn't seem to work. I created an MWE for Google colab below.
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
X = np.zeros([10, 10000])
y = np.zeros([10, 10000])
########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()
m.fit(X,y)
K.clear_session()
After running the part below ########
for three times, I get the following error:
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-3-7ae5ab890fc2> in <module>
3 m.summary()
4
----> 5 m.fit(X,y)
6 K.clear_session()
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
53 ctx.ensure_initialized()
54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
ResourceExhaustedError: Graph execution error:
Detected at node 'RMSprop/RMSprop/update_2/mul_2' defined at (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
app.start()
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start
self.io_loop.start()
File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner
self.run()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 381, in dispatch_queue
yield self.process_one()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 346, in wrapper
runner = Runner(result, future, yielded)
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1080, in __init__
self.run()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request
user_expressions, allow_stdin,
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
return runner(coro)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-7ae5ab890fc2>", line 5, in <module>
m.fit(X,y)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
return self.apply_gradients(grads_and_vars, name=name)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 682, in apply_gradients
name=name)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 724, in _distributed_apply
var, apply_grad_to_update_var, args=(grad,), group=False)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var
update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/optimizers/optimizer_v2/rmsprop.py", line 216, in _resource_apply_dense
var_t = var - coefficients["lr_t"] * grad / (
Node: 'RMSprop/RMSprop/update_2/mul_2'
failed to allocate memory
[[{{node RMSprop/RMSprop/update_2/mul_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_2465]
I want to play around with slightly different data on the same model, so I run a similar part several times. I can simply restart the notebook after the error, but it takes some time to load the data, so is there an option how I can really clear an old model? Thanks for help.
Please restart the runtime and try again as I tried replicating the above code and it's working fine.
You can check the output mentioned below for the same code:
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
X = np.zeros([10, 10000])
y = np.zeros([10, 10000])
########
m = Sequential([Dense(10000, input_shape=(10000,)), Dense(10000), Dense(10000), Dense(10000)])
m.compile(loss='mse')
m.summary()
m.fit(X,y, epochs=2)
K.clear_session()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10000) 100010000
dense_1 (Dense) (None, 10000) 100010000
dense_2 (Dense) (None, 10000) 100010000
dense_3 (Dense) (None, 10000) 100010000
=================================================================
Total params: 400,040,000
Trainable params: 400,040,000
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
1/1 [==============================] - 10s 10s/step - loss: 0.0000e+00
Epoch 2/2
1/1 [==============================] - 6s 6s/step - loss: 0.0000e+00
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.