如何在张量流上使用 fp16(Eigen::half) 进行卷积

Question

How can I use tensorflow to do convolution using fp16 on GPU?如何使用 tensorflow 在 GPU 上使用 fp16 进行卷积？ (the python api using __half or Eigen::half). （使用 __half 或 Eigen::half 的 python api）。

I want to test a model with fp16 on tensorflow, but I got stucked.我想在 tensorflow 上用 fp16 测试一个模型，但我被卡住了。 Actually, I found that fp16 convolution in tensorflow seems like casting the fp32 convolution's result into fp16, which is not what I need.实际上，我发现 tensorflow 中的 fp16 卷积似乎将 fp32 卷积的结果转换为 fp16，这不是我需要的。

I tried to give the tf.nn.conv2d a fp16 input in fp16 format, and give the tf.nn.conv2d a fp16 input in fp32 format (tf.cast it into fp32) then tf.cast the result into fp16, and they gave exactly the same result.我试图给 tf.nn.conv2d 一个 fp16 格式的 fp16 输入，并给 tf.nn.conv2d 一个 fp32 格式的 fp16 输入（tf.cast 到 fp32）然后 tf.cast 结果到 fp16，他们给出了完全相同的结果。 But as I think, doing convolution in fp16 is different from doing it in fp32 and then cast it into fp16, am I wrong?但正如我所想，在 fp16 中进行卷积与在 fp32 中进行卷积然后将其转换为 fp16 不同，我错了吗？ Please help me, thanks.请帮帮我，谢谢。

environment:
ubuntu 16.04
tensorflow 1.9.0
cuda 9.0
Tesla V100

import tensorflow as tf
import numpy as np
import os

def conv16_32(input, kernel): # fake fp16 convolution
    input = tf.cast(input, tf.float16)
    kernel = tf.cast(kernel, tf.float16)
    input = tf.cast(input, tf.float32)
    kernel = tf.cast(kernel, tf.float32)
    out = tf.nn.conv2d(input, kernel, [1,1,1,1], padding='VALID')
    out = tf.cast(out, tf.float16)
    out = tf.cast(out, tf.float64)
    return out

def conv16(input, kernel): # real fp16 convolution
    input = tf.cast(input, tf.float16)
    kernel = tf.cast(kernel, tf.float16)
    out = tf.nn.conv2d(input, kernel, [1,1,1,1], padding='VALID')
    out = tf.cast(out, tf.float64)
    return out

x = np.random.rand(16, 32, 32, 16).astype('float64')
w = np.random.rand(3, 3, 16, 16).astype('float64')
x = tf.get_variable('input', dtype=tf.float64, initializer=x)
w = tf.get_variable('weight', dtype=tf.float64, initializer=w)

out_16 = conv16(x, w)
out_16_32 = conv16_32(x, w)

os.environ['CUDA_VISIBLE_DEVICES'] = '1'
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config = config)
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
print(sess.run(tf.reduce_max(out_16_32 - out_16)))

The above two functions give the same result, say the final 'print' result is zero.以上两个函数给出了相同的结果，假设最终的“打印”结果为零。

The result of fp16 convolution and fp32 convolution should not be same (in my point of view). fp16 卷积和 fp32 卷积的结果应该不一样（在我看来）。 How can I use tensorflow to do convolution using real fp16 on GPU?如何使用 tensorflow 在 GPU 上使用真实的 fp16 进行卷积？ (the python api using __half or Eigen::half) （使用 __half 或 Eigen::half 的 python api）

Answer 1

I think you are using the operations correctly.我认为您正在正确使用这些操作。 In your example, you can check that the convolution operations do indeed have the right type.在您的示例中，您可以检查卷积操作确实具有正确的类型。

conv2d_op_16 = out_16.op.inputs[0].op
print(conv2d_op_16.name, conv2d_op_16.type, conv2d_op_16.get_attr('T'))
# Conv2D Conv2D <dtype: 'float16'>
conv2d_op_16_32 = out_16_32.op.inputs[0].op.inputs[0].op
print(conv2d_op_16_32.name, conv2d_op_16_32.type, conv2d_op_16_32.get_attr('T'))
# Conv2D_1 Conv2D <dtype: 'float32'>

And TensorFlow does register kernels for fp16 for CPU and for GPU , so there is no reason to think is doing anything else.并且 TensorFlow 确实为 CPU和GPU注册了 fp16 内核，因此没有理由认为正在做任何其他事情。 I don't have a lot of experience with fp16, so I'm not sure if the zero difference is "normal", but there does not seem to be any way in which conv16 is using anything other than a fp16 convolution.我对 fp16 没有太多经验，所以我不确定零差异是否“正常”，但conv16似乎没有任何方式使用 fp16 卷积以外的任何东西。

Answer 2

I'm trying to figure out the same.我试图找出相同的。 Here is some simple code that you can test convolutions with:下面是一些简单的代码，你可以用它来测试卷积：

import tensorflow as tf
tf.enable_eager_execution()
input = tf.cast([[[[65519], [65519], [65519], [65519]]]], tf.float16) #BHWC
filter = tf.cast([[[[65519]], [[-65519]]]], tf.float16) #HWIO
tf.print(tf.nn.conv2d(input, filter, [1,1,1,1], "VALID"))

This should overflow if the convolutions are done in fp16, but doesn't actually overflow in Tensorflow.如果卷积是在 fp16 中完成的，这应该会溢出，但实际上不会在 Tensorflow 中溢出。 The result I get is [[[[0][0][0]]]] , which suggest that convolutions are performed in fp32.我得到的结果是[[[[0][0][0]]]] ，这表明卷积是在 fp32 中执行的。

Edit: The solution is to set the environment variable:编辑：解决方法是设置环境变量：

TF_FP16_CONV_USE_FP32_COMPUTE=0

This gives the result [[[[inf][inf][inf]]]] , suggesting that this time the convolution is performed in fp16.这给出了结果[[[[inf][inf][inf]]]] ，表明这次卷积是在 fp16 中进行的。 It seems you need at least a 10x0 GPU for this.为此，您似乎至少需要一个 10x0 GPU。

如何在张量流上使用 fp16(Eigen::half) 进行卷积

问题描述

2 个解决方案

解决方案1
0 2019-08-22 10:14:12

解决方案2
0 2019-10-29 14:41:03

如何在张量流上使用 fp16(Eigen::half) 进行卷积

问题描述

2 个解决方案

解决方案1 0 2019-08-22 10:14:12

解决方案2 0 2019-10-29 14:41:03

解决方案1
0 2019-08-22 10:14:12

解决方案2
0 2019-10-29 14:41:03