简体   繁体   中英

Best way to pause a loop with an internal async call in C#

I am sending a list of about 100,000 JSON objects to an API that can only accept them one by one and I am sending them asynchronously. I know that internally the API sends the received object to a queue which seems to be chocking up by all of these requests which results of me getting a "Gateway Timeout" error after quite a few of them.

I tried breaking up the list in batches of different sizes and putting the thread to sleep after each batch is sent but what ends up happening is that it fails with the same error at about the batch size, I've tried it with batches of 3000, 2500 and 1000 with the same result and the Thread never seems to go to sleep.

Here's the code in question:

public async Task TransferData(IEnumerable<MyData> data)  
{  
     var pages = Math.Ceil(data.Count() / 3000m);  

     for (var page = 0; page < pages; page++)  
     {  
         await TransferPage(data.Skip(page * 3000).Take(3000);
         Thread.Sleep(10000);  
     }  
}

private async Task TransferPage(IEnumerable<MyData> data)  
{  
     await Task.WhenAll(data.Select(p => webConnection.PostDataAsync(JsonConvert.SerializeObject(p, Formatting.None))));  
}

Note: webConnection is just a class that has a HttpClient already instantiated and does a PostAsync for the data to the intended URL.

The call to TransferData is done in a Console Application like so:

try  
{  
   ...    
   dataManager.TransferData(data).Wait();
}
catch(AggregateException ex)
{
   ...
}
catch(Exception ex)
{
   ...
}

Thank you for any guidance.

UPDATE: To clarify some of the confusion that arose in the comments. The external API is receiving the objects one by one, if you take a look at private method TransferPage inside of the WhenAll the IEnumerable has a Select with the call to the method that internally does the actual HttpClient PostAsync one. So the objects ARE being grouped in batches and within each batch they are sent one by one. I hope this makes it a little bit more clear.

What's likely happening is that one or more of the PostDataAsync tasks is throwing the timeout error, resulting in a failed task. Task.WhenAll only bundles these up into an AggregateException and throws it once all the tasks in the list are completed, which is why you only see an exception at the end of a batch.

You are likely overwhelming the service, despite your attempt to throttle. You should probably do a couple things:

  • Improve the exception handling and retry situation. You could do this inside PostDataAsync and/or outside of it. Even if you aren't overwhelming the service, you are going to need to handle transient exceptions anyway to deal with network hiccups and the like.
  • Replace your batching logic with a proper throttling implementation. The answers to the question that Serg linked in the comments are a good start - SemaphoreSlim or TPL Dataflow are common solutions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM