Tensorflow vs Tensorflow JS 浮点算术计算的不同结果

Question

I have converted a Tensorflow model to Tensorflow JS and tried using in the browser.我已将 Tensorflow 模型转换为 Tensorflow JS 并尝试在浏览器中使用。 There are some preprocessing steps which are to be executed on the inout before feeding it to the model for inference.在将 inout 提供给模型进行推理之前，需要在 inout 上执行一些预处理步骤。 I have implemented these steps same as the Tensorflow.我已经实现了与 Tensorflow 相同的这些步骤。 The problem is the inference results are not same on TF JS in comparison with Tensorflow.问题是与 Tensorflow 相比，TF JS 上的推理结果并不相同。 So I have started debugging the code and found that the results from the floating point arithmetic operations in the preprocessing on TF JS are different from the Tensorflow which is running on Docker container with GPU.于是我开始调试代码，发现TF JS预处理中浮点运算的结果与运行在GPU容器上的Tensorflow不同。 Code used in the TF JS is below. TF JS 中使用的代码如下。

       var tensor3d = tf.tensor3d(image,[height,width,1],'float32')

        var pi= PI.toString();
        if(bs == 14 && pi.indexOf('1') != -1 ) {

          tensor3d =  tensor3d.sub(-9798.6993999999995).div(7104.607118190255)

        }
        else if(bs == 12 && pi.indexOf('1') != -1) {

          tensor3d = tensor3d.sub(-3384.9893000000002).div(1190.0708513300835)
        }
        else if(bs == 12 && pi.indexOf('2') != -1) {

          tensor3d =  tensor3d.sub(978.31200000000001).div(1092.2426342420442)

        }
        var resizedTensor = tensor3d.resizeNearestNeighbor([224,224]).toFloat()
        var copiedTens = tf.tile(resizedTensor,[1,1,3])
        return copiedTens.expandDims();

Python code blocks used使用的 Python 代码块

ds = pydicom.dcmread(input_filename, stop_before_pixels=True)
if (ds.BitsStored == 12) and '1' in ds.PhotometricInterpretation:
    normalize_mean = -3384.9893000000002
    normalize_std = 1190.0708513300835
elif (ds.BitsStored == 12) and '2' in ds.PhotometricInterpretation:
    normalize_mean = 978.31200000000001
    normalize_std = 1092.2426342420442
elif (ds.BitsStored == 14) and '1' in ds.PhotometricInterpretation:
    normalize_mean = -9798.6993999999995
    normalize_std = 7104.607118190255
else:
    error_response = "Unable to read required metadata, or metadata invalid. 
    BitsStored: {}. PhotometricInterpretation: {}".format(ds.BitsStored, 
    ds.PhotometricInterpretation)
    error_json = {'code': 500, 'message': error_response}
    self._set_headers(500)
    self.wfile.write(json.dumps(error_json).encode())
    return

    normalization = Normalization(mean=normalize_mean, std=normalize_std)
    resize = ResizeImage()
    copy_channels = CopyChannels()
    inference_data_collection.append_preprocessor([normalization, resize, 
    copy_channels])

Normalization code规范化代码

    def normalize(self, normalize_numpy, mask_numpy=None):

        normalize_numpy = normalize_numpy.astype(float)

        if mask_numpy is not None:
            mask = mask_numpy > 0
        elif self.mask_zeros:
            mask = np.nonzero(normalize_numpy)
        else:
            mask = None

        if mask is None:
            normalize_numpy = (normalize_numpy - self.mean) / self.std
        else:
            raise NotImplementedError

        return normalize_numpy

ResizeImage code调整图像代码

   from skimage.transform import resize

   def Resize(self, data_group):

        input_data = data_group.preprocessed_case

        output_data = resize(input_data, self.output_dim)

        data_group.preprocessed_case = output_data
        self.output_data = output_data

CopyChannels code复制频道代码

    def CopyChannels(self, data_group):

        input_data = data_group.preprocessed_case

        if self.new_channel_dim:
            output_data = np.stack([input_data] * self.channel_multiplier, -1)
        else:
            output_data = np.tile(input_data, (1, 1, self.channel_multiplier))

        data_group.preprocessed_case = output_data
        self.output_data = output_data

Sample outoputs Left is Tensorflow on Docker with GPU and right is TF JS:示例输出左侧是带有 GPU 的 Docker 上的 Tensorflow，右侧是 TF JS：

The results are actually different after every step.每走一步，结果其实都不一样。

Answer 1

There might be a number of possibilities that can lead to the issue.可能有多种可能导致该问题。

1- The ops used in python are not used in the same manner in both js and python. 1- python 中使用的操作在 js 和 python 中的使用方式不同。 If that is the case, using exactly the same ops will get rid of the issue.如果是这种情况，使用完全相同的操作将解决这个问题。

2- The tensors image might be read differently by the python library and the browser canvas. 2- 张量图像可能会被 python 库和浏览器画布以不同的方式读取。 Actually, accross browsers the canvas pixel don't always have the same value due to some operations like anti-aliasing, etc ... as explained in this answer .实际上，由于某些操作（如抗锯齿等），在不同浏览器中画布像素并不总是具有相同的值……如本答案中所述。 So there might be some slight differences in the result of the operations.所以操作的结果可能会有一些细微的差异。 To make sure that this is the root cause of the issue, first try to print the python and the js array image and see if they are alike.为了确保这是问题的根本原因，首先尝试打印 python 和 js 数组image ，看看它们是否相似。 It is likely that the 3d tensor is different in js and python. js和python中的3d张量很可能不同。

tensor3d = tf.tensor3d(image,[height,width,1],'float32')

In this case, instead of reading directly the image in the browser, one can use the python library to convert image to array of tensor.在这种情况下，可以使用 python 库将图像转换为张量数组，而不是直接在浏览器中读取图像。 And use tfjs to read directly this array instead of the image.并使用 tfjs 直接读取这个数组而不是图像。 That way, the input tensors will be the same both for in js and in python.这样，输入张量在 js 和 python 中都是相同的。

3 - it is a float32 precision issue. 3 - 这是一个 float32 精度问题。 tensor3d is created with the dtype float32 and depending on the operations used, there might be a precision issue. tensor3d 是使用 dtype float32创建的，根据所使用的操作，可能存在精度问题。 Consider this operation:考虑这个操作：

tf.scalar(12045, 'int32').mul(tf.scalar(12045, 'int32')).print(); // 145082032 instead of 145082025

The same precision issue will be encountered in python with the following:在 python 中将遇到相同的精度问题，如下所示：

a = tf.constant([12045], dtype='float32') * tf.constant([12045], dtype='float32')
tf.print(a) // 145082032

In python this can be solved by using int32 dtype.在 python 中，这可以通过使用int32 dtype 来解决。 However because of the webgl float32 limitation the same thing can't be done using the webgl backend on tfjs.然而，由于 webgl float32限制，在 tfjs 上使用 webgl 后端无法完成同样的事情。 In neural networks, this precision issue is not a great deal.在神经网络中，这个精度问题并不是什么大问题。 To get rid of it, one can change the backend using setBackend('cpu') for instance which is much slower.要摆脱它，可以使用setBackend('cpu')更改后端，例如，速度要慢得多。

Tensorflow vs Tensorflow JS 浮点算术计算的不同结果

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-19 14:15:33

Tensorflow vs Tensorflow JS 浮点算术计算的不同结果

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-19 14:15:33

解决方案1
1 已采纳 2019-06-19 14:15:33