简体   繁体   中英

measuring similarity between two rgb images in python

I have two rgb images of same size, and I would like to compute a similarity metric. I thought of starting out with euclidean distance:

import scipy.spatial.distance as dist
import cv2

im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")

>> im1.shape
(820, 740, 3)

>> dist.euclidean(im1,im2)

ValueError: Input vector should be 1-D.

I know that dist.euclidean expects a 1-D array and im1 and im2 are 3-D, but is there a function that will work with 3-D arrays, or is it possible to transform im1 and im2 into a 1-D array that preserves the information in the images?

Grayscale Solution (?)

(There is a discussion below as to your comment about a function "that preserves the information in the images")

It seems possible to me that you might be able to solve the problem using a grayscale image rather than an RGB image. I know I'm making assumptions here, but it's a thought.

I'm going to try a simple example relating to your code, then give an example of an image similarity measure using 2D Discrete Fourier Transforms that uses a conversion to grayscale. That DFT analysis will have its own section

(My apologies if you see this while in progress. I'm just trying to make sure my work is saved.)

Because of my assumption, I'm going to try your method with some RGB images, then see if the problem will be solved by converting to grayscale. If the problem is solved with grayscale, we can do an analysis of the amount of information loss brought on by the grayscale solution by finding the image similarity using a combination of all three channels, each compared separately.

Method

Making sure I have all the libraries/packages/whatever you want to call them.

 > python -m pip install opencv-python > python -m pip install scipy > python -m pip install numpy

Note that, in this trial, I'm using some PNG images that were created in the attempt (described below) to use a 2D DFT.

Making sure I get the same problem

>>> import scipy.spatial.distance as dist >>> import cv2 >>> >>> im1 = cv2.imread("rhino1_clean.png") >>> im2 = cv2.imread("rhino1_streak.png") >>> >>> im1.shape (178, 284, 3) >>> >>> dist.euclidean(im1, im2) ## Some traceback stuff ## ValueError: Input vector should be 1-D.

Now, let's try using grayscale. If this works, we can simply find the distance for each of the RGB Channels. I hope it works, because I want to do the information-loss analysis.

Let's convert to grayscale:

>>> dists = []
>>> for i in range(0, len(im1_gray)):
...   dists.append(dist.euclidean(im1_gray[i], im2_gray[i]))
...
>>> sum_dists = sum(dists)
>>> ave_dist = sum_dists/len(dists)
>>> ave_dist
2185.9891304058297

A simple dist.euclidean(im1_gray, im2,gray) will lead to the same ValueError: Input vector should be 1-D. exception, but I know the structure of a grayscale image array (an array of pixel rows), so I do the following.

 >>> dists = [] >>> for i in range(0, len(im1_gray)): ... dists.append(dist.euclidean(im1_gray[i], im2_gray[i])) ... >>> sum_dists = sum(dists) >>> ave_dist = sum_dists/len(dists) >>> ave_dist 2185.9891304058297

By the way, here are the two original images:

rhino1_clean.jpg

rhino1_streak.jpg

Grayscale worked (with massaging), let's try color

Following some procedure from this SO answer, let's do the following.


Preservation of Information

Following the analysis here ( archived ), let's look at our information loss. (Note that this will be a very naïve analysis, but I want to give a crack at it.

Grayscale vs. Color Information

Let's just look at the color vs. the grayscale. Later, we can look at whether we preserve the information about the distances.

Comparisons of different distance measures using grayscale vs. all three channels - comparison using ratio of distance sums for a set of images.

I don't know how to do entropy measurements for the distances, but my intuition tells me that, if I calculate distances using grayscale and using color channels, I should come up with similar ratios of distances IF I haven't lost any information.


My first thought when seeing this question was to use a 2-D Discrete Fourier Transform, which I'm sure is available in Python or NumPy or OpenCV. Basically, your first components of the DFT will relate to large shapes in your image. (Here is where I'll put in a relevant research paper: link . I didn't look too closely - anyone is welcome to suggest another.)

So, let me look up a 2-D DFT easily available from Python, and I'll get back to putting up some working code.

(My apologies if you see this while in progress. I'm just trying to make sure my work is saved.)

First, you'll need to make sure you have PIL Pillow and NumPy . It seems you have NumPy , but here are some instructions. (Note that I'm on Windows at the moment) ...

 > python -m pip install opencv-python > python -m pip install numpy > python -m pip install pillow

Now, here are 5 images -

  1. a rhino image, rhino1_clean.jpg ( source );

rhino1_clean.jpg

the same image with some black streaks drawn on by me in MS Paint, rhino1_streak.jpg ;

rhino1_streak.jpg

another rhino image, rhino2_clean.jpg ( source );

rhino2_clean.jpg

a first hippo image hippo1_clean.jpg ( source );

hippo1_clean.jpg

a second hippo image, hippo2_clean.jpg ( source ).

hippo2_clean.jpg

All images used with fair use.

Okay, now, to illustrate further, let's go to the Python interactive terminal.

>python

>>> rh_img_1_cln = PIL.Image.open("rhino1_clean.jpg")
>>> rh_img_1_cln.save("rhino1_clean.png")
>>> rh_img_1_cln_gs = PIL.Image.open("rhino1_clean.png").convert('LA')
>>> rh_img_1_cln_gs.save("rhino1_clean_gs.png")

First of all, life will be easier if we use grayscale PNG images - PNG because it's a straight bitmap (rather than a compressed image), grayscale because I don't have to show all the details with the channels.

 >>> rh_img_1_cln = PIL.Image.open("rhino1_clean.jpg") >>> rh_img_1_cln.save("rhino1_clean.png") >>> rh_img_1_cln_gs = PIL.Image.open("rhino1_clean.png").convert('LA') >>> rh_img_1_cln_gs.save("rhino1_clean_gs.png")

Follow similar steps for the other four images. I used PIL variable names, rh_img_1_stk , rh_img_2_cln , hp_img_1_cln , hp_img_2_cln . I ended up with the following image filenames for the grayscale images, which I'll use further: rhino1_streak_gs.png , rhino2_clean_gs.png , hippo1_clean_gs.png , hippo2_clean_gs.png .

Now, let's get the coefficients for the DFTs. The following code (ref. this SO answer ) would be used for the first, clean rhino image.

Let's "look" at the image array, first. This will show us a grid version of the top-left column, with higher values being more white and lower values being more black.

Note that, before I begin outputting this array, I set things to the numpy default, cf. https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html

 >>> np.set_printoptions(edgeitems=3,infstr='inf', ... linewidth=75, nanstr='nan', precision=8, ... suppress=False, threshold=1000, formatter=None)
>>> np.set_printoptions(formatter={'all':lambda x: '{0:.2f}'.format(x)})
>>>
>>> rh1_cln_gs_fft = np.fft.fft2(rh_img_1_cln_gs)
>>> rh1_cln_gs_scaled_fft = 255.0 * rh1_cln_gs_fft / rh1_cln_gs_fft.max()
>>> rh1_cln_gs_real_fft = np.absolute(rh1_cln_gs_scaled_fft)
>>> for i in {0,1,2,3,4}:
...   print(rh1_cln_gs_real_fft[i][:13])
...
[255.00 1.46 7.55 4.23 4.53 0.67 2.14 2.30 1.68 0.77 1.14 0.28 0.19]
[38.85 5.33 3.07 1.20 0.71 5.85 2.44 3.04 1.18 1.68 1.69 0.88 1.30]
[29.63 3.95 1.89 1.41 3.65 2.97 1.46 2.92 1.91 3.03 0.88 0.23 0.86]
[21.28 2.17 2.27 3.43 2.49 2.21 1.90 2.33 0.65 2.15 0.72 0.62 1.13]
[18.36 2.91 1.98 1.19 1.20 0.54 0.68 0.71 1.25 1.48 1.04 1.58 1.01]

Now, let's run the DFT and look at the results. I change my numpy print options to make things nicer before I start the actual transform.

[255.00 3.14 7.69 4.72 4.34 0.68 2.22 2.24 1.84 0.88 1.14 0.55 0.25]
[40.39 4.69 3.17 1.52 0.77 6.15 2.83 3.00 1.40 1.57 1.80 0.99 1.26]
[30.15 3.91 1.75 0.91 3.90 2.99 1.39 2.63 1.80 3.14 0.77 0.33 0.78]
[21.61 2.33 2.64 2.86 2.64 2.34 2.25 1.87 0.91 2.21 0.59 0.75 1.17]
[18.65 3.34 1.72 1.76 1.44 0.91 1.00 0.56 1.52 1.60 1.05 1.74 0.66]

Now, the result for following the same procedure with rhino1_streak.jpg

>>> for i in {0,1,2,3,4}:
...   print(rh1_cln_gs_real_fft[i][:13] - rh1_stk_gs_real_fft[i][:13])
...
[0.00 -1.68 -0.15 -0.49 0.19 -0.01 -0.08 0.06 -0.16 -0.11 -0.01 -0.27
 -0.06]
[-1.54 0.64 -0.11 -0.32 -0.06 -0.30 -0.39 0.05 -0.22 0.11 -0.11 -0.11 0.04]
[-0.53 0.04 0.14 0.50 -0.24 -0.02 0.07 0.30 0.12 -0.11 0.11 -0.10 0.08]
[-0.33 -0.16 -0.37 0.57 -0.15 -0.14 -0.36 0.46 -0.26 -0.07 0.13 -0.14
 -0.04]
[-0.29 -0.43 0.26 -0.58 -0.24 -0.37 -0.32 0.15 -0.27 -0.12 -0.01 -0.17
 0.35]

I'll print \\Delta values instead of doing a more comprehensive distance. You could sum the squares of the values shown here, if you want a distance.

 >>> for i in {0,1,2,3,4}: ... print(rh1_cln_gs_real_fft[i][:13] - rh1_stk_gs_real_fft[i][:13]) ... [0.00 -1.68 -0.15 -0.49 0.19 -0.01 -0.08 0.06 -0.16 -0.11 -0.01 -0.27 -0.06] [-1.54 0.64 -0.11 -0.32 -0.06 -0.30 -0.39 0.05 -0.22 0.11 -0.11 -0.11 0.04] [-0.53 0.04 0.14 0.50 -0.24 -0.02 0.07 0.30 0.12 -0.11 0.11 -0.10 0.08] [-0.33 -0.16 -0.37 0.57 -0.15 -0.14 -0.36 0.46 -0.26 -0.07 0.13 -0.14 -0.04] [-0.29 -0.43 0.26 -0.58 -0.24 -0.37 -0.32 0.15 -0.27 -0.12 -0.01 -0.17 0.35]

I'll be putting just three coefficient arrays truncated to a length of five to show how this works for showing image similarity. Honestly, this is an experiment for me, so we'll see how it goes.

You can work on comparing those coefficients with distances or other metrics.


More About Preservation of Information

Let's do an information-theoretical analysis of information loss with the methods proposed above. Following the analysis here ( archived ), let's look at our information loss.


Good luck!

You can try

import scipy.spatial.distance as dist
import cv2
import numpy as np

im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")

dist.euclidean(im1.flatten(), im2.flatten())

You can use the reshape function for both images to convert them from 3D to 1D.

import scipy.spatial.distance as dist
import cv2

im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")

im1.reshape(1820400)
im2.reshape(1820400)

dist.euclidean(im1,im2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM