简体   繁体   English

在 .NET Core Web API 中运行并行异步任务并返回结果

[英]Running parallel async tasks and return result in .NET Core Web API

Hi Recently i was working in .net core web api project which is downloading files from external api.嗨,最近我在 .net 核心 web api 项目中工作,该项目正在从外部 Z8A5DA52ED126447D359E 下载文件。 In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others.在这个 .net 核心 api 最近发现了一些问题,而文件数量超过 100 个。API 正在下载最多 50 个文件并跳过其他文件。 WebAPI is deployed on AWS Lambda and timeout is 15mnts. WebAPI 部署在 AWS Lambda 上,超时时间为 15 分钟。

Actually the operation is timing out due to the long download process实际上由于下载过程较长,操作超时

public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
        {
            try
            {
                bool DownloadFlag = false;

                foreach (DownloadAttachment downloadAttachment in downloadAttachments)
                {
                    DownloadFlag = await DownloadAttachment(downloadAttachment.id);

                    //update the download status in database
                    if(DownloadFlag)
                    {
                      bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);

                      if (UpdateFlag)
                      {
                        await DeleteAttachment(downloadAttachment.id);
                      }
                   }
                }
                return true;
            }
            catch (Exception ext)
            {
                log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
                return false;
            }
        }

Document service code文件服务代码

public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
        {
            return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
        }

And DB update code和数据库更新代码

public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
        {
            using (var db = new SqlConnection(_connectionString.Value))
            {
                var Result = 0; bool SuccessFlag = false;
                var parameters = new DynamicParameters();
                parameters.Add("@pm_AttachmentID", AttachmentID);               
                parameters.Add("@pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
                var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
                Result = parameters.Get<int>("@pm_Result");
                if (Result > 0) { SuccessFlag = true; }
                return SuccessFlag;
            }
        }

How can i move this async task to run parallel?如何移动此异步任务以并行运行? and get the result?并得到结果? i tried following code我尝试了以下代码

var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result; 

Is this approach is fine?这种方法好吗? how can improve the performance?怎样才能提高性能? how to get the result from each parallel task and update to DB and delete based on success flag?如何从每个并行任务中获取结果并根据成功标志更新到 DB 和删除? Or this error is due to AWS timeout?或者这个错误是由于 AWS 超时造成的?

Please help请帮忙

If you extracted the code that handles individual files to a separate method:如果您将处理单个文件的代码提取到单独的方法中:

private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
    try
    {
        var download = await DownloadAttachment(downloadAttachment.id);
        if(download)
        {
            var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
            if (update)
            {
                await DeleteAttachment(downloadAttachment.id);
            }
        }
    }
    catch(....)
    {
    ....
    }
}

public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
    try
    {
      foreach (var attachment in downloadAttachments)
      {
          await DownloadSingleAttachment(attachment);
      }
    }
    ....
}

It would be easy to start all downloads at once, although not very efficient:一次开始所有下载很容易,虽然效率不高:

public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{

    try
    {
        //Start all of them
        var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
        await Task.WhenAll(tasks);
    }
    ....
}

This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling.这不是很有效,因为外部服务像您一样讨厌来自单一来源的大量并发调用,并且几乎可以肯定会施加限制。 The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another.数据库也不喜欢大量并发调用,因为在所有数据库产品中,并发调用都会导致以一种或另一种方式阻塞。 Even in databases that use multiversioning, this comes with an overhead.即使在使用多版本控制的数据库中,这也会带来开销。

Using Dataflow classes - Single block使用数据流类 - 单块

One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.解决此问题的一种简单方法是使用 .NET 的Dataflow类将操作分解为一系列步骤,并使用不同数量的并发任务执行每个步骤。

We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe:我们可以将整个操作放在一个块中,但是如果更新和删除操作不是线程安全的,这可能会导致问题:

var dlOptions= new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 10,
};

var downloader=new ActionBlock<DownloadAttachment>(async att=>{
    await DownloadSingleAttachment(att);
},dlOptions);

foreach (var attachment in downloadAttachments)
{
    await downloader.SendAsync(attachement.id);
}

downloader.Complete();
await downloader.Completion;

Dataflow - Multiple steps数据流 - 多个步骤

To avoid possible thread issues, the rest of the methods can go to their own blocks.为了避免可能的线程问题,rest 的方法可以 go 到自己的块。 They could both go into one ActionBlock that calls both Update and Delete , or they could go into separate blocks if the methods talk to different services with different concurrency requirements.他们可以将 go 合并到一个同时调用UpdateDeleteActionBlock中,或者如果方法与具有不同并发要求的不同服务通信,他们可以将 go 合并到单独的块中。

The downloader block will execute at most 10 concurrent downloads. downloader器块将执行最多 10 个并发下载。 By default, each block uses only a single task at a time.默认情况下,每个块一次只使用一个任务。

The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time. updaterdeleter块的默认 DOP=1,这意味着只要它们不尝试同时使用相同的连接,就没有竞争条件的风险。

var downloader=new TransformBlock<string,(string id,bool download)>(
    async id=> {
        var download=await DownloadAttachment(id);
        return (id,download);
},dlOptions);

var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
    async (id,download)=> {
        if(download)
        {
            var update = await _DocumentService.UpdateDownloadStatus(id);
            return (id,update);
        }
        return (id,false);
});

var deleter=new ActionBlock<(string id,bool update)>(
    async (id,update)=> {
        if(update)
        {
            await DeleteAttachment(id);
        }
});

The blocks can be linked into a pipeline now and used.这些块现在可以链接到管道中并使用。 The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well:设置PropagateCompletion = true意味着一旦一个块完成处理,它就会告诉所有连接的块也完成:

var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);

We can pump data into the head block as long as we need.只要我们需要,我们就可以将数据泵入 head 块中。 When we're done, we call the head block's Complete() method.完成后,我们调用 head 块的Complete()方法。 As each block finishes processing its data, it will propagate its completion to the next block in the pipeline.当每个块完成处理其数据时,它会将其完成传播到管道中的下一个块。 We need to await for the last (tail) block to complete to ensure all the attachments have been processed:我们需要等待最后一个(尾)块完成以确保所有附件都已处理:

foreach (var attachment in downloadAttachments)
{
    await downloader.SendAsync(attachement.id);
}

downloader.Complete();
await deleter.Completion;

Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other.每个块都有一个输入和(必要时)一个 output 缓冲区,这意味着消息的“生产者”和“消费者”不必同步,甚至不必相互了解。 All the "producer" needs to know is where to find the head block in a pipeline.所有“生产者”需要知道的是在管道中的哪里可以找到头块。

Throttling and backpressure节流和背压

One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism .一种节流方法是通过MaxDegreeOfParallelism使用固定数量的任务。

It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough.也可以对输入缓冲区进行限制,从而在块无法足够快地处理消息时阻止先前的步骤或生产者。 This can be done simply by setting the BoundedCapacity option for a block:这可以通过为块设置BoundedCapacity 选项来完成:

var dlOptions= new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 10,
    BoundedCapacity=20,
};

var updaterOptions= new ExecutionDataflowBlockOptions
{
    BoundedCapacity=20,
};

...

var downloader=new TransformBlock<...>(...,dlOptions);

var updater=new TransformBlock<...>(...,updaterOptions);

No other changes are necessary无需其他更改

To run multiple asynchronous operations you could do something like this:要运行多个异步操作,您可以执行以下操作:

    public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
    {
        const int myNumberOfConcurrentOperations = 10;
        var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
        var tasks = new List<Task>();
        foreach(var myItem in myList)
        {
            await mySemaphore.WaitAsync();
            var task = RunOperation(myItem);
            tasks.Add(task);
            task.ContinueWith(t => mySemaphore.Release());           
        }

        await Task.WhenAll(tasks);
    }

    private async Task RunOperation<T>(T myItem)
    {
        // Do stuff
    }

Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment将来自DownloadAttachmentsAsync的代码放在“Do stuff”评论中

This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention.这将使用信号量来限制并发操作的数量,因为由于争用,运行许多并发操作通常是一个坏主意。 You would need to experiment to find the optimal number of concurrent operations for your use case.您需要进行试验以找到适合您的用例的最佳并发操作数。 Also note that error handling have been omitted to keep the example short.另请注意,已省略错误处理以保持示例简短。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM