简体   繁体   中英

What is the best way to compare images in a bitmap list

I'm working on an application where I can load multiple pictures in a list and compare every picture in that list with each others so I can find duplicated pictures.

So first I successfully got the pictures and loaded them in a IList<Bitmap> :

 public IList<Bitmap> getPictures()
        {
            IList<Bitmap> pictures = new List<Bitmap>();

            string filepath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
            DirectoryInfo d = new DirectoryInfo(filepath+ "\\Phone Pictures\\");
            foreach (var picture in d.GetFiles("*.png"))
            {
                pictures.Add(ConvertToBitmap(picture.FullName));
            }
                return pictures;
        }

than I used a pre-made image comparing algorithm :

public static CompareResult Compare(Bitmap bmp1, Bitmap bmp2)
        {
            CompareResult cr = CompareResult.ciCompareOk;

            //Test to see if we have the same size of image
            if (bmp1.Size != bmp2.Size)
            {
                cr = CompareResult.ciSizeMismatch;
            }
            else
            {
                //Convert each image to a byte array
                System.Drawing.ImageConverter ic =
                       new System.Drawing.ImageConverter();
                byte[] btImage1 = new byte[1];
                btImage1 = (byte[])ic.ConvertTo(bmp1, btImage1.GetType());
                byte[] btImage2 = new byte[1];
                btImage2 = (byte[])ic.ConvertTo(bmp2, btImage2.GetType());

                //Compute a hash for each image
                SHA256Managed shaM = new SHA256Managed();
                byte[] hash1 = shaM.ComputeHash(btImage1);
                byte[] hash2 = shaM.ComputeHash(btImage2);

                //Compare the hash values
                for (int i = 0; i < hash1.Length && i < hash2.Length
                                  && cr == CompareResult.ciCompareOk; i++)
                {
                    if (hash1[i] != hash2[i])
                        cr = CompareResult.ciPixelMismatch;
                }
            }
            return cr;
        } 

Now this is how I try to call the algorithm and apply it to my loaded list :

public void ComparePictureList()
        {

            IList<Bitmap> picturesList = getPictures();

            foreach (var picture1 in picturesList)
            {
                foreach( var picture2 in picturesList)
                {
                    Compare(picture1, picture2);
                }
            }

        }

But is there a better way to apply my algorithm to my list , I mean instead of declaring 2 loops picture1 and picture2 is there any functionality in the .NET framework that could be better ?

PS: for anyone who is wondering what is ConvertToBitmap this is it :

   public Bitmap ConvertToBitmap(string fileName)
        {
            Bitmap bitmap;
            using (Stream bmpStream = System.IO.File.Open(fileName, System.IO.FileMode.Open))
            {
                Image image = Image.FromStream(bmpStream);

                bitmap = new Bitmap(image);

            }
            return bitmap;
        }

I would avoid calculating the hash multiple times for the same image, and looping through the images only once:

public static void Main(string[] args)
{
    var files = new Dictionary<string, string>();
    foreach (var file in Directory.GetFiles("c:\\", "*.png"))
    {
        files.Add(file, CalculateHash(file));
    }

    var duplicates = files.GroupBy(item => item.Value).Where(group => group.Count() > 1);
}

private static string CalculateHash(string file)
{
    using (var stream = File.OpenRead(file))
    {
        var sha = new SHA256Managed();
        var checksum = sha.ComputeHash(stream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}
  1. You cannot be sure that two images with the same hash are equal.

  2. If your hash functions uses all bytes of the image, you could faster compare the bytes of the images instead of calculation a hash and compare them.

  3. you calculate the hash of each image multiple times. You do not need to do this.

I recommend to do the following:

  1. calculate a hash for each image and store it.

  2. either use a map or two loops to find hash collisions.

  3. Images with the same hash need to be compared byte by bate to be sure that they are equal.

You are already calculating the hash of each image so you can convert it to eg String and then just use Dictionary<String, Bitmap> where the Key will be the hash. You can use ContainsKey to quickly determine if the image hash is already in the list.

Since you are opening the image files via stream there is simpler way to calculate hash from Stream as mentioned here Calculate the Hash of the Contents of a File in C#? . You will probably have to rewind the stream to read the image.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM