简体   繁体   中英

Running parallel async tasks and return result in .NET Core Web API

Hi Recently i was working in .net core web api project which is downloading files from external api. In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others. WebAPI is deployed on AWS Lambda and timeout is 15mnts.

Actually the operation is timing out due to the long download process

public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
        {
            try
            {
                bool DownloadFlag = false;

                foreach (DownloadAttachment downloadAttachment in downloadAttachments)
                {
                    DownloadFlag = await DownloadAttachment(downloadAttachment.id);

                    //update the download status in database
                    if(DownloadFlag)
                    {
                      bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);

                      if (UpdateFlag)
                      {
                        await DeleteAttachment(downloadAttachment.id);
                      }
                   }
                }
                return true;
            }
            catch (Exception ext)
            {
                log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
                return false;
            }
        }

Document service code

public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
        {
            return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
        }

And DB update code

public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
        {
            using (var db = new SqlConnection(_connectionString.Value))
            {
                var Result = 0; bool SuccessFlag = false;
                var parameters = new DynamicParameters();
                parameters.Add("@pm_AttachmentID", AttachmentID);               
                parameters.Add("@pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
                var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
                Result = parameters.Get<int>("@pm_Result");
                if (Result > 0) { SuccessFlag = true; }
                return SuccessFlag;
            }
        }

How can i move this async task to run parallel? and get the result? i tried following code

var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result; 

Is this approach is fine? how can improve the performance? how to get the result from each parallel task and update to DB and delete based on success flag? Or this error is due to AWS timeout?

Please help

If you extracted the code that handles individual files to a separate method:

private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
    try
    {
        var download = await DownloadAttachment(downloadAttachment.id);
        if(download)
        {
            var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
            if (update)
            {
                await DeleteAttachment(downloadAttachment.id);
            }
        }
    }
    catch(....)
    {
    ....
    }
}

public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
    try
    {
      foreach (var attachment in downloadAttachments)
      {
          await DownloadSingleAttachment(attachment);
      }
    }
    ....
}

It would be easy to start all downloads at once, although not very efficient:

public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{

    try
    {
        //Start all of them
        var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
        await Task.WhenAll(tasks);
    }
    ....
}

This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling. The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another. Even in databases that use multiversioning, this comes with an overhead.

Using Dataflow classes - Single block

One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.

We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe:

var dlOptions= new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 10,
};

var downloader=new ActionBlock<DownloadAttachment>(async att=>{
    await DownloadSingleAttachment(att);
},dlOptions);

foreach (var attachment in downloadAttachments)
{
    await downloader.SendAsync(attachement.id);
}

downloader.Complete();
await downloader.Completion;

Dataflow - Multiple steps

To avoid possible thread issues, the rest of the methods can go to their own blocks. They could both go into one ActionBlock that calls both Update and Delete , or they could go into separate blocks if the methods talk to different services with different concurrency requirements.

The downloader block will execute at most 10 concurrent downloads. By default, each block uses only a single task at a time.

The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time.

var downloader=new TransformBlock<string,(string id,bool download)>(
    async id=> {
        var download=await DownloadAttachment(id);
        return (id,download);
},dlOptions);

var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
    async (id,download)=> {
        if(download)
        {
            var update = await _DocumentService.UpdateDownloadStatus(id);
            return (id,update);
        }
        return (id,false);
});

var deleter=new ActionBlock<(string id,bool update)>(
    async (id,update)=> {
        if(update)
        {
            await DeleteAttachment(id);
        }
});

The blocks can be linked into a pipeline now and used. The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well:

var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);

We can pump data into the head block as long as we need. When we're done, we call the head block's Complete() method. As each block finishes processing its data, it will propagate its completion to the next block in the pipeline. We need to await for the last (tail) block to complete to ensure all the attachments have been processed:

foreach (var attachment in downloadAttachments)
{
    await downloader.SendAsync(attachement.id);
}

downloader.Complete();
await deleter.Completion;

Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other. All the "producer" needs to know is where to find the head block in a pipeline.

Throttling and backpressure

One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism .

It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough. This can be done simply by setting the BoundedCapacity option for a block:

var dlOptions= new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 10,
    BoundedCapacity=20,
};

var updaterOptions= new ExecutionDataflowBlockOptions
{
    BoundedCapacity=20,
};

...

var downloader=new TransformBlock<...>(...,dlOptions);

var updater=new TransformBlock<...>(...,updaterOptions);

No other changes are necessary

To run multiple asynchronous operations you could do something like this:

    public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
    {
        const int myNumberOfConcurrentOperations = 10;
        var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
        var tasks = new List<Task>();
        foreach(var myItem in myList)
        {
            await mySemaphore.WaitAsync();
            var task = RunOperation(myItem);
            tasks.Add(task);
            task.ContinueWith(t => mySemaphore.Release());           
        }

        await Task.WhenAll(tasks);
    }

    private async Task RunOperation<T>(T myItem)
    {
        // Do stuff
    }

Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment

This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention. You would need to experiment to find the optimal number of concurrent operations for your use case. Also note that error handling have been omitted to keep the example short.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM