简体   繁体   中英

How to replicate PyTorch normalization in OpenCV or NumPy?

I need to replicate PyTorch image normalization in OpenCV or NumPy.

Quick backstory: I'm doing a project where I'm training in PyTorch but will have to inference in OpenCV due to deploying to an embedded device where I won't have the storage space to install PyTorch. After training in PyTorch and saving a PyTorch graph I'm then converting to an ONNX graph. For inferencing in OpenCV I'm opening the image as an OpenCV image (ie NumPy array), then resizing, then successively calling cv2.normalize , cv2.dnn.blobFromImage , net.setInput , and net.forward .

I'm getting slightly different accuracy results when test inferencing in PyTorch vs inferencing in OpenCV, and I suspect the difference is due to the normalization process producing a slightly different result between the two.

Here is a quick script I put together to show the difference on a single image. Note that I'm using grayscale (single-channel) and I'm normalizing into the -1.0 to +1.0 range:

# scratchpad.py

import torch
import torchvision

import cv2
import numpy as np
import PIL
from PIL import Image

TRANSFORM = torchvision.transforms.Compose([
    torchvision.transforms.Resize((224, 224)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize([0.5], [0.5])
])

def main():
    # 1st show PyTorch normalization

    # open the image as an OpenCV image
    openCvImage = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
    # convert OpenCV image to PIL image
    pilImage = PIL.Image.fromarray(openCvImage)
    # convert PIL image to a PyTorch tensor
    ptImage = TRANSFORM(pilImage).unsqueeze(0)
    # show the PyTorch tensor info
    print('\nptImage.shape = ' + str(ptImage.shape))
    print('ptImage max = ' + str(torch.max(ptImage)))
    print('ptImage min = ' + str(torch.min(ptImage)))
    print('ptImage avg = ' + str(torch.mean(ptImage)))
    print('ptImage: ')
    print(str(ptImage))

    # 2nd show OpenCV normalization

    # resize the image
    openCvImage = cv2.resize(openCvImage, (224, 224))
    # convert to float 32 (necessary for passing into cv2.dnn.blobFromImage which is not show here)
    openCvImage = openCvImage.astype('float32')
    # use OpenCV version of normalization, could also do this with numpy
    cv2.normalize(openCvImage, openCvImage, 1.0, -1.0, cv2.NORM_MINMAX)
    # show results
    print('\nopenCvImage.shape = ' + str(openCvImage.shape))
    print('openCvImage max = ' + str(np.max(openCvImage)))
    print('openCvImage min = ' + str(np.min(openCvImage)))
    print('openCvImage avg = ' + str(np.mean(openCvImage)))
    print('openCvImage: ')
    print(str(openCvImage))

    print('\ndone !!\n')
# end function

if __name__ == '__main__':
    main()

Here is the test image that I'm using:

在此处输入图像描述

and here are the results I'm getting currently:

$ python3 scratchpad.py 

ptImage.shape = torch.Size([1, 1, 224, 224])
ptImage max = tensor(0.9608)
ptImage min = tensor(-0.9686)
ptImage avg = tensor(0.1096)
ptImage: 
tensor([[[[ 0.0431, -0.0431,  0.1294,  ...,  0.8510,  0.8588,  0.8588],
          [ 0.0510, -0.0510,  0.0980,  ...,  0.8353,  0.8510,  0.8431],
          [ 0.0588, -0.0431,  0.0745,  ...,  0.8510,  0.8588,  0.8588],
          ...,
          [ 0.6157,  0.6471,  0.5608,  ...,  0.6941,  0.6627,  0.6392],
          [ 0.4902,  0.3961,  0.3882,  ...,  0.6627,  0.6471,  0.6706],
          [ 0.3725,  0.4039,  0.5451,  ...,  0.6549,  0.6863,  0.6549]]]])

openCvImage.shape = (224, 224)
openCvImage max = 1.0000001
openCvImage min = -1.0
openCvImage avg = 0.108263366
openCvImage: 
[[ 0.13725497 -0.06666661  0.20000008 ...  0.8509805   0.8666668
   0.8509805 ]
 [ 0.15294124 -0.06666661  0.09019614 ...  0.8274511   0.8431374
   0.8274511 ]
 [ 0.12156869 -0.06666661  0.0196079  ...  0.8509805   0.85882366
   0.85882366]
 ...
 [ 0.5843138   0.74117655  0.5450981  ...  0.83529425  0.59215695
   0.5764707 ]
 [ 0.6862746   0.34117654  0.39607853 ...  0.67843145  0.6705883
   0.6470589 ]
 [ 0.34117654  0.4117648   0.5215687  ...  0.5607844   0.74117655
   0.59215695]]

done !!

As you can see the results are similar but definitely not exactly the same.

How can I do the normalization in OpenCV and have it come out exactly or almost exactly the same as the PyTorch normalization? I've tried various options in both OpenCV and with NumPy but could not get it any closer than the above results, which are substantially different.

-- Edit ---------------------------

In response to Ivan, I also tried this:

# resize the image
openCvImage = cv2.resize(openCvImage, (224, 224))
# convert to float 32 (necessary for passing into cv2.dnn.blobFromImage which is not show here)
openCvImage = openCvImage.astype('float32')
mean = np.mean(openCvImage)
stdDev = np.std(openCvImage)
openCvImage = (openCvImage - mean) / stdDev
# show results
print('\nopenCvImage.shape = ' + str(openCvImage.shape))
print('openCvImage max = ' + str(np.max(openCvImage)))
print('openCvImage min = ' + str(np.min(openCvImage)))
print('openCvImage avg = ' + str(np.mean(openCvImage)))
print('openCvImage: ')
print(str(openCvImage))

Which results in:

openCvImage.shape = (224, 224)
openCvImage max = 2.1724665
openCvImage min = -2.6999729
openCvImage avg = 7.298528e-09
openCvImage: 
[[ 0.07062991 -0.42616782  0.22349077 ...  1.809422    1.8476373
   1.809422  ]
 [ 0.10884511 -0.42616782 -0.04401573 ...  1.7520993   1.7903144
   1.7520993 ]
 [ 0.0324147  -0.42616782 -0.21598418 ...  1.809422    1.8285296
   1.8285296 ]
 ...
 [ 1.1597633   1.5419154   1.0642253  ...  1.7712069   1.178871
   1.1406558 ]
 [ 1.4081622   0.56742764  0.70118093 ...  1.3890547   1.3699471
   1.3126242 ]
 [ 0.56742764  0.7393961   1.0069026  ...  1.1024406   1.5419154
   1.178871  ]]

Which is similar to the PyTorch normalization but clearly not the same.

I'm attempting to achieve normalization in OpenCV that produces the same result as the PyTorch normalization.

I realize that due to slight differences in the resizing operation (and possibly very small rounding differences) I'll probably never get exactly the same normalized result but I'd like to get as close as possible to the PyTorch result.

According to the doc torchvision.transforms.Normalize() normalize by with mean and std . That is:

output[channel] = (input[channel] - mean[channel]) / std[channel]

While in your code

cv2.normalize(openCvImage, openCvImage, 1.0, -1.0, cv2.NORM_MINMAX)

is minmax scaling. They are two different normalizations. You can simply rebuild the scaling with:

openCvImage = (openCvImage - 0.5) / 0.5

@Quang Hoang already explained the differences. I would just like to add some details. The function cv2.normalize performs a minmax scaling. Which maps values from [min(data), max(data)] to the provided interval [a, b] , here [-1, 1] . Therefore, it's the same as computing data = (data-min(data))/(max(data)-min(data))*(ba)+a .

Here is before and after calling cv2.normalize on openCvImage :

openCvImage-------------
shape = (224, 224)
min = 0.0
max = 255.0
avg = 141.2952

openCvImage-------------
shape = (224, 224)
min = -1.0
max = 1.0
avg = 0.10819771

So cv2.normalize(openCvImage, openCvImage, 1.0, -1.0, cv2.NORM_MINMAX) is the same as (openCvImage - openCvImage.min()) / (openCvImage.max() - openCvImage.min())*2 - 1


On the other hand, torchvision.transforms.Normalize will perform a shift-scale transform: data = (data - mean)/std . Yet, this can be a little confusing because mean doesn't necessarily have to be the average of the input data (same for the standard deviation). I hope you will have noticed that the mean and std of your PyTorch tensor is not 0.5 and 0.5 respectively.

ptImage-------------
shape = torch.Size([224, 224])
avg = tensor(0.5548)
std = tensor(0.5548)

If you are looking to standardize your data, ie make mean=0 and std=1 , you can compute the z-score (with torchvision.transforms.Normalize ). But you can only do so by first measuring your data's mean and std.


Also note torchvision.transforms.ToTensor does perform a min-max operation:

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

Building off of what @Quang Hoang and @Ivan mentioned above, I was running into a similar issue and had some success with a few modifications to your original code. Using a sample image I'm able to get a similar mean pixel intensity value across the PyTorch and OpenCV transformed images (within 3%). Further, the PyTorch and OpenCV images written out by the script give the same predictions and similar confidence when tested with a local ONNX model.

import torch
import torchvision

import cv2
import numpy as np
import PIL
from PIL import Image

TRANSFORM = torchvision.transforms.Compose([
    torchvision.transforms.Resize((224, 224)),
    torchvision.transforms.ToTensor(), # (H, W, C) [0, 255] -> (C, H, W) [0.0, 1.0]
    torchvision.transforms.Normalize([0.5], [0.5])
])

# 1st show PyTorch normalization
# open the image as an OpenCV image
openCvImage = cv2.imread('image.jpg')

# convert OpenCV image to PIL image
pilImage = PIL.Image.fromarray(openCvImage)

# convert PIL image to a PyTorch tensor, swap axes to format for imwrite
ptImageResize = np.array(TRANSFORM(pilImage)).swapaxes(0,2).swapaxes(0,1)

cv2.imshow('pytorch-transforms', ptImageResize)
cv2.imwrite('image-pytorch-transforms.jpg', ptImageResize)

# show the PyTorch tensor info
print('\nptImageResize.shape = ' + str(ptImageResize.shape))
print('ptImageResize max = ' + str(np.max(ptImageResize)))
print('ptImageResize min = ' + str(np.min(ptImageResize)))
print('ptImageResize avg = ' + str(np.mean(ptImageResize)))
print('ptImageResize: ')
print(str(ptImageResize))

# 2nd show OpenCV normalization
# resize the image
openCvImageResize = cv2.resize(openCvImage, (224, 224), interpolation=cv2.INTER_NEAREST)

# Rescale image from [0, 255] to [0.0, 1.0] as in the PyTorch ToTensor() method
# img = (img - mean) / stdDev
openCvImageResize = openCvImageResize / 255

# Normalize the image to mean and std
mean = [0.5]
std = [0.5]
openCvImageResize = (openCvImageResize - mean) / std

cv2.imshow('opencv-transforms', openCvImageResize)
cv2.imwrite('image-opencv-transforms.jpg', openCvImageResize)

# show results
print('\nopenCvImageResize.shape = ' + str(openCvImageResize.shape))
print('openCvImageResize max = ' + str(np.max(openCvImageResize)))
print('openCvImageResize min = ' + str(np.min(openCvImageResize)))
print('openCvImageResize avg = ' + str(np.mean(openCvImageResize)))
print('openCvImageResize: ')
print(str(openCvImageResize))
    
cv2.waitKey(0)
cv2.destroyAllWindows()

This probably would be helpful
If you look at actual implementation of

torchvision.transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    )

Below block is what it actually does:

import numpy as np
from PIL import Image
MEAN = 255 * np.array([0.485, 0.456, 0.406])
STD = 255 * np.array([0.229, 0.224, 0.225])
img_pil = Image.open("ty.jpg")
x = np.array(img_pil)
x = x.transpose(-1, 0, 1)
x = (x - MEAN[:, None, None]) / STD[:, None, None]

Here i have done it on image

OpenCV type:

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img/255.0
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]

img[..., 0] -= mean[0]
img[..., 1] -= mean[1]
img[..., 2] -= mean[2]

img[..., 0] /= std[0]
img[..., 1] /= std[1]
img[..., 2] /= std[2]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM