I am trying to use the TensorFlow CLI debugger in order to identify the operation which is causing a NaN during training of a network, but when I try to run the code I get an error:
_curses.error: cbreak() returned ERR
I'm running the code on an Ubuntu server, which I'm connecting to via SSH, and have tried to follow this tutorial .
I have tried using tf.add_check_numerics_ops()
, but the layers in the network include while loops so are not compatible. This is the section of code where the error is being raised:
import tensorflow as tf
from tensorflow.python import debug as tf_debug
...
#Prepare data
train_data, val_data, test_data = dataset.prepare_datasets(model_config)
sess = tf.Session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
# Create iterators
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle, train_data.output_types, train_data.output_shapes)
mixed_spec, voice_spec, mixed_audio, voice_audio = iterator.get_next()
training_iterator = train_data.make_initializable_iterator()
validation_iterator = val_data.make_initializable_iterator()
testing_iterator = test_data.make_initializable_iterator()
training_handle = sess.run(training_iterator.string_handle())
...
and the full error is:
Traceback (most recent call last):
File "main.py", line 64, in <module>
@ex.automain
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/experiment.py", line 137, in automain
self.run_commandline()
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/experiment.py", line 209, in run
run()
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/run.py", line 221, in __call__
self.result = self.main_function(*args)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "main.py", line 95, in do_experiment
training_handle = sess.run(training_iterator.string_handle())
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/wrappers/framework.py", line 455, in run
is_callable_runner=bool(callable_runner)))
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/wrappers/local_cli_wrapper.py", line 255, in on_run_start
self._run_start_response = self._launch_cli()
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/wrappers/local_cli_wrapper.py", line 431, in _launch_cli
title_color=self._title_color)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/cli/curses_ui.py", line 492, in run_ui
self._screen_launch(enable_mouse_on_start=enable_mouse_on_start)
File "/home/enterprise.internal.city.ac.uk/acvn728/.local/lib/python3.5/site-packages/tensorflow/python/debug/cli/curses_ui.py", line 445, in _screen_launch
curses.cbreak()
_curses.error: cbreak() returned ERR
I'm pretty new to using Ubuntu (and TensorFlow), but as far as I can tell the server does have ncurses installed, which should allow the required curses based interface:
acvn728@america:~/MScFinalProject$ dpkg -l '*ncurses*' | grep '^ii'
ii libncurses5:amd64 6.0+20160213-1ubuntu1 amd64 shared libraries for terminal handling
ii libncursesw5:amd64 6.0+20160213-1ubuntu1 amd64 shared libraries for terminal handling (wide character support)
ii ncurses-base 6.0+20160213-1ubuntu1 all basic terminal type definitions
ii ncurses-bin 6.0+20160213-1ubuntu1 amd64 terminal-related programs and man pages
ii ncurses-term 6.0+20160213-1ubuntu1 all additional terminal type definitions
Problem solved! The solution was to change
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
to
sess = tf_debug.LocalCLIDebugWrapperSession(sess, ui_type="readline")
This is similar to the solution to this question , but II think it is important to note that they are different because a) it refers to a different function and a different API and b) I wasn't trying to run from an IDE, as mentioned in that solution.
cbreak
would return ERR
if you run a curses application that is not on a real terminal (ie, something that works with POSIX termios calls ).
From the description,
but the layers in the network include while loops so are not compatible
it does not seem you are running in a terminal.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.