简体   繁体   中英

Parallel.ForEach Error when using WebClient

First, my disclaimer: I'm a parallel noob. I thought this would be an easy "embarrassingly parallel" problem to tackle, but it's thrown me for a loop.

I'm trying to download some photos in parallel from the web. The original photos are Hi-Res and take up quite a bit of space, so I'm going to compact them once they're downloaded.

Here's the code:

    private static void DownloadPhotos(ISet<MyPhoto> photos)
    {
        List<MyPhoto> failed = new List<MyPhoto>();

        DateTime now = DateTime.Now;
        string folderDayOfYear = now.DayOfYear.ToString();
        string folderYear = now.Year.ToString();
        string imagesFolder = string.Format("{0}{1}\\{2}\\", ImagePath, folderYear, folderDayOfYear);

        if (!Directory.Exists(imagesFolder))
        {
            Directory.CreateDirectory(imagesFolder);
        }

        Parallel.ForEach(photos, photo =>
        {
            if (!SavePhotoFile(photo.Url, photo.Duid + ".jpg", imagesFolder))
            {
                failed.Add(photo);
                Console.WriteLine("adding to failed photos: {0} ", photo.Duid.ToString());
            }
        });

        Console.WriteLine();
        Console.WriteLine("failed photos count: {0}", failed.Count);

        RemoveHiResPhotos(string.Format(@"{0}\{1}\{2}", ImagePath, folderYear, folderDayOfYear));
    }


    private static bool SavePhotoFile(string url, string fileName, string imagesFolder)
    {
        string fullFileName = imagesFolder + fileName;
        string originalFileName = fileName.Replace(".jpg", "-original.jpg");
        string fullOriginalFileName = imagesFolder + originalFileName;

        if (!File.Exists(fullFileName))
        {
            using (WebClient webClient = new WebClient())
            {
                try
                {
                    webClient.DownloadFile(url, fullOriginalFileName);
                }
                catch (Exception ex)
                {
                    Console.WriteLine();
                    Console.WriteLine("failed to download photo: {0}", fileName);
                    return false;
                }
            }
            CreateStandardResImage(fullOriginalFileName, fullOriginalFileName.Replace("-original.jpg", ".jpg"));
        }
        return true;
    }

    private static void CreateStandardResImage(string hiResFileName, string stdResFileName)
    {
        Image image = Image.FromFile(hiResFileName);
        Image newImage = image.Resize(1024, 640);
        newImage.SaveAs(hiResFileName, stdResFileName, 70, ImageFormat.Jpeg);
    }

So here's where things confuse me: each of the photos hits the Catch{} block of the SavePhotoFile() method at the webClient.DownloadFile line. The error message is an exception occured during a WebClient request and the inner detail is "The process cannot access the file . . . -original.jpg because it is being used by another process."

If I wasn't confused enough by this error, I'm confused even more by what happens next. It turns out that if I just ignore the message and wait, the image will eventually download and be processed.

What's going on?

OK, so it appears in my focus on parallelism that I made a simple error: I assumed something about my data that wasn't true. Brianestey figured out the problem: Duid isn't unique. It's supposed to be unique, except for some missing code in the process to create the list.

The fix was to add this to the MyPhoto class

    public override bool Equals(object obj)
    {
        if (obj is MyPhoto)
        {
            var objPhoto = obj as MyPhoto;
            if (objPhoto.Duid == this.Duid)
                return true;
        }
        return false;
    }

    public override int GetHashCode()
    {
        return this.Duid.GetHashCode();
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM