简体   繁体   中英

How can I refactor this ForEach(..) code to use Parallel.ForEach(..)?

i've got a list of objects which I wish to copy from one source to another. It was suggested that I could speed things up by using Parallel.ForEach

How can I refactor the following pseduo code to leverage Parallel.ForEach(..) ?

var foos = GetFoos().ToList();
foreach(var foo in foos)
{
    CopyObjectFromOldBucketToNewBucket(foo, oldBucket, newBucket, 
        accessKeyId, secretAccessKey);
}

CopyObjectFromOldBucketToNewBucket uses the Amazon REST APIs to move items from one bucket to another.

Cheers :)

Since your code doesn't have any dependencies other than to foos you can simply do:

Parallel.ForEach(foos, ( foo => 
{
    CopyObjectFromOldBucketToNewBucket(foo, oldBucket, newBucket, 
                                       accessKeyId, secretAccessKey);
}));

Keep in mind though, that I/O can only be parallelized to a certain degree, after that performance might actually degrade.

Parallel is actually not the best option here. Parallel will run your code in parallel but will still use up a thread pool thread for each request to AWS. It would be far better use of resources to use the BeginCopyObject method instead. This will not use up a thread pool thread waiting on a response but will only utilize it when the response is received and needs to be processed.

Here's a simplified example of how to use Begin/End methods. These are not specific to AWS but is a pattern found throughout the .NET BCL.

public static CopyFoos() 
{
    var client = new AmazonS3Client(...);
    var foos = GetFoos().ToList();
    var asyncs = new List<IAsyncResult>();
    foreach(var foo in foos)
    {
        var request = new CopyObjectRequest { ... };  

        asyncs.Add(client.BeginCopyObject(request, EndCopy, client));
    }

    foreach(IAsyncResult ar in asyncs)
    {
        if (!ar.IsCompleted)
        {
            ar.AsyncWaitHandle.WaitOne();
        }
    }
}

private static EndCopy(IAsyncRequest ar) 
{    
    ((AmazonS3Client)ar.AsyncState).EndCopyObject(ar);
}

For production code you may want to keep track of how many requests you've dispatched and only send out a limited number at any one time. Testing or AWS docs may tell you how many concurrent requests are optimal.

In this case we don't actually need to do anything when the requests are completed so you may be tempted to skip the EndCopy calls but that would cause a resource leak. Whenever you call BeginXxx you must call the corresponding EndXxx method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM