I'm writing a tool in C# to find duplicate images. Currently I create an MD5 checksum of the files and compare those.
Unfortunately, the images can be:
What would be the best approach to solve this problem?
Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)
List<bool>
- this is the hashCode :
public static List<bool> GetHash(Bitmap bmpSource)
{
List<bool> lResult = new List<bool>();
//create new image with 16x16 pixel
Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
for (int j = 0; j < bmpMin.Height; j++)
{
for (int i = 0; i < bmpMin.Width; i++)
{
//reduce colors to true / false
lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
}
}
return lResult;
}
I know, GetPixel
is not that fast but on a 16x16 pixel image it should not be the bottleneck.
Code:
List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));
//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);
So this code is able to find equal images with:
i
and j
Update / Improvements:
after using this method for a while I noticed a few improvements that can be done
GetPixel
for more performance0.5f
to differ between light and dark - use the distinct median brightness of all 256 pixels. Otherwise dark/light images are assumed to be the same and it enables to detect images which have a changed brightness.bool[]
or List<bool>
if you need to store a lot hashes with the need to save memory, use a Bitarray
because a Boolean isn't stored in a bit, it takes a byte !You could check Algorithm to compare two images in order to see the available methods for image comparison.
Unless you want to recreate the full algorithms on your own, you should try to use already existing libraries or a least part of their code (as long as their license is ok for you).
For an open source C# implementation of Edge detection and related Computer vision algorithms, you can try EmguCV which is a wrapper of OpenCV.
After resampling the images to some common resolution, you could use a Wavelet Decomposition and compare the coefficients of this decomposition instead of the images themselves. Comparing only the first N coefficients will make this method more robust to noise and other artifacts.
There are several C# implementations for wavelets available. One example is https://waveletstudio.codeplex.com/
Interesting question, the comparision of images is not that hard given that,
One way of doing comparison would be to,
Now if the value comes within a reasonable value say 90% (probably have to determine by doing few experiments), then you could safely assume both are the same, but this is not going to work if,
Nice advanced stuff here, but lots of images will really be the same file. One measure I would consider is to do the dumb part first.. create a directory list and sort your image files in order of size. Find all pairs with same size files and check each pair for an exact match. For each match, you can remove the double.
Now for the interesting part..
One thing that has not come up in above solutions is to make use of color. You could use a color histogram comparison. For red, green and blue channels, count each color occurrence and put the counts in tree integer[255] arrays. Then normalize each RGB array to floating point [0.0 .. 1.0] values and compare as Vector3(RGB) distance. This approach can find rotated and resized versions of an image. A match in the color domain does not give a guarantee.. but you can use it to (again) group files for further refined processing and speed things up.
private List<byte> colorList = new List<byte>();
private string hash;
private string GetImageHash(Bitmap bmpSource)
{
colorList.Clear();
int i,j;
Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16)); //create new image with 16x16 pixel
for ( j = 0 ; j < bmpMin.Height; j++)
{
for ( i = 0; i < bmpMin.Width; i++)
{
colorList.Add(bmpMin.GetPixel(i, j).R);
}
}
SHA1Managed sha = new SHA1Managed();
byte[] checksum = sha.ComputeHash(colorList.ToArray());
hash = BitConverter.ToString(checksum).Replace("-", String.Empty);
sha.Dispose();
bmpMin.Dispose();
return hash;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.