简体   繁体   English

TF.Keras SparseCategoricalCrossEntropy 在 GPU 上返回 nan

[英]TF.Keras SparseCategoricalCrossEntropy return nan on GPU

Tried to train UNet on GPU to create binary classified image.尝试在 GPU 上训练 UNet 以创建二进制分类图像。 Got nan loss on each epoch.每个时期都有 nan 损失。 Testing of loss function always produces nan-return.损失 function 的测试总是产生 nan-return。

Test case:测试用例:

import tensorflow as tf
import tensorflow.keras.losses as ls

true = [0.0, 1.0]
pred = [[0.1,0.9],[0.0,1.0]]

tt = tf.convert_to_tensor(true)
tp = tf.convert_to_tensor(pred)

l = ls.SparseCategoricalCrossentropy(from_logits = True)
ret = l(tt,tp)

print(ret) #tf.Tensor(nan, shape=(), dtype=float32)

If i would force my tf to work with CPU ( Can Keras with Tensorflow backend be forced to use CPU or GPU at will? ), all works fine.如果我强制我的 tf 与 CPU 一起工作( Keras 与 Tensorflow 后端是否可以随意使用 CPU 或 GPU? ),一切正常。 And yes, my UNet fits and predicts correctly on CPU.是的,我的 UNet 在 CPU 上适合并正确预测。

I checked several posts on keras GitHub, but the all point to problems with compiled ANN, such as using inappropriate optimizers for categorical crossentropy.我检查了 keras GitHub 上的几篇文章,但所有文章都指向编译 ANN 的问题,例如对分类交叉熵使用不适当的优化器。

Any workaround?任何解决方法? Am i missing something?我错过了什么吗?

The test code you have provided is working fine on google colab.您提供的测试代码在 google colab 上运行良好。

tf.__version__

2.3 2.3

tf.config.list_physical_devices('GPU')  

Output: Output:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]  

Your code:你的代码:

import tensorflow as tf
import tensorflow.keras.losses as ls

true = [0.0, 1.0]
pred = [[0.1,0.9],[0.0,1.0]]

tt = tf.convert_to_tensor(true)
tp = tf.convert_to_tensor(pred)

l = ls.SparseCategoricalCrossentropy(from_logits = True)
ret = l(tt,tp)

print(ret)  

Result:结果:

tf.Tensor(0.8132616, shape=(), dtype=float32)

I had the same issue.我遇到过同样的问题。 My loss was real number if I trained on CPU.如果我在 CPU 上训练,我的损失是实数。 I tried to upgrade tf version but it wasn't the fix.我试图升级 tf 版本,但它不是修复。 I finally fixed my issue by reduce the y dimension.我终于通过减少 y 维度解决了我的问题。 My model output was 2D array.我的 model output 是二维数组。 when I reduce to 1D, I can get real loss on GPU.当我减少到一维时,我会在 GPU 上得到真正的损失。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM