How to fix "Tensor had NaN values" while training in Tensorflow

Question

While training an object detector on a Cloud TPU I get the following error:

Error recorded from training_loop: Gradient for FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_1_Conv2d_5_1x1_48/weights:0 is NaN : Tensor had NaN values

This always happens at the same step in my training. I'm quite unsure what I might be doing wrong to cause this.

Any advice would be great! I'll try to respond as quickly as possible.

I've followed this guide to train the object detector on Google's TPU systems.

Here's the full error: Error recorded from training_loop: Gradient for FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_1_Conv2d_5_1x1_48/weights:0 is NaN : Tensor had NaN values [[node CheckNumerics_99 (defined at /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py:1112) ]] Caused by op u'CheckNumerics_99', defined at: File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 142, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 126, in main estimator.train(input_fn=train_input_fn, max_steps=train_steps) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2251, in _call_model_fn config) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2633, in _model_fn update_ops = _sync_variables_ops(ctx) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 207, in _sync_variables_ops for v in variables.trainable_variables() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 919, in check_numerics "CheckNumerics", tensor=tensor, message=message, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Gradient for FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_1_Conv2d_5_1x1_48/weights:0 is NaN : Tensor had NaN values [[node CheckNumerics_99 (defined at /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py:1112) ]] Expand all | Collapse all { Error recorded from training_loop: Gradient for FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_1_Conv2d_5_1x1_48/weights:0 is NaN : Tensor had NaN values [[node CheckNumerics_99 (defined at /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py:1112) ]] Caused by op u'CheckNumerics_99', defined at: File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 142, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 126, in main estimator.train(input_fn=train_input_fn, max_steps=train_steps) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2251, in _call_model_fn config) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2633, in _model_fn update_ops = _sync_variables_ops(ctx) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 207, in _sync_variables_ops for v in variables.trainable_variables() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 919, in check_numerics "CheckNumerics", tensor=tensor, message=message, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Gradient for FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_1_Conv2d_5_1x1_48/weights:0 is NaN : Tensor had NaN values [[node CheckNumerics_99 (defined at /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py:1112) ]] Expand all | Collapse all {

Answer 1

Tensorflow doesn't play nice with NaN values. I would recommend using SciKit Learns built in Imputer function to "Fill in the missing data"

# Import Important Modules
import numpy as np # Numpy for Linear Algebra
import pandas as pd # Pandas for data storage and representation
import matplotlib.pyplot as plt # Matplotlib for glorious charts

from sklearn.impute import SimpleImputer

THIS IS WHAT YOU NEED TO IMPORT DUDE!

dataset = dataset 
# This is a pretend variable. I don't know what the specifics of your 
# dataset are

# Imputer has 2 "Strategies" you might find useful
# Mean (replacing the missing values in your dataset 
# with the mean of all values in that column)

# Most Frequent (Replacing the missing values in your 
# dataset with the most frequent value "I.E MODE" )

mean_imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
# missing values parameter is what SimpleImputer is looking to 
# "Fill" in the dataset
# strategy is the type of imputing SimpleImputer is going to perform. 
# In this case, replace missing values with the mean of that column

most_frequent_imputer = SimpleImputer(missing_values=np.nan, strategy='most_frequent')

### FIT THE IMPUTER TO YOUR DATASET:
# Note: One thing I didn't know for the longest time is that the 
# methods I thought separate

# mean_imputer.fit() & mean_imputer.transform() can actually be 
# performed in one action
# mean_imputer.fit_transform() :-)!

dataset = mean_imputer.fit_transform(dataset) 
# Apply the mean imputer to the dataset

dataset = most_frequent_imputer.fit_transform(dataset) 
# Apply the most frequent imputer to the dataset

### I hope this helps. 
### If you have any questions don't hesitate to ask.

How to fix "Tensor had NaN values" while training in Tensorflow

Question

1 answers

solution1
0 2019-04-13 03:14:21

THIS IS WHAT YOU NEED TO IMPORT DUDE!

How to fix "Tensor had NaN values" while training in Tensorflow

Question

1 answers

solution1 0 2019-04-13 03:14:21

THIS IS WHAT YOU NEED TO IMPORT DUDE!

solution1
0 2019-04-13 03:14:21