简体   繁体   中英

Running same task multiple times in parallel with EF Core

I have a Task that generates a PDF file for an order (it takes about 10 seconds to create one PDF):

public async Task GeneratePDF(Guid Id) {
   var order = await 
      _context
      .Orders
      .Include(order => order.Customer)
      ... //a lot more Include and ThenInclude statements
      .FirstOrDefaultAsync(order ==> order.Id == Id);
   var document = ...  //PDF generated here, takes about 10 seconds
   order.PDF = document ;
   _context.SaveChangesAsync();
}

I tried the following:

public async Task GenerateAllPDFs() {
   var orderIds = await _context.Orders.Select(order=> order.Id).ToListAsync();
   foreach (var id in orderIds)
   {
      _ = GeneratePDF(id).ContinueWith(t => Console.WriteLine(t.Exception), TaskContinuationOptions.OnlyOnFaulted);
   }
}

this gives me the error:

System.ObjectDisposedException: Cannot access a disposed object. A common cause of this error is disposing a context that was resolved from dependency injection and then later trying to use the same context instance elsewhere in your application. This may occur if you are calling Dispose() on the context, or wrapping the context in a using statement. If you are using dependency injection, you should let the dependency injection container take care of disposing context instances.

If I change the task as follows...

public async Task GenerateAllPDFs() {
   var orderIds = await _context.Orders.Select(order=> order.Id).ToListAsync();
   foreach (var id in orderIds)
   {
      _ = await GeneratePDF(id);
   }
}

...it runs the task for each order in series, taking ages to complete (I have a few thousands orders, taking about 10 seconds per order)...

How can I run this task in parallel for all orders in the context, so that the time it takes to complete is much less than sequential processing?

You can map your order IDs to tasks and await them all like:

public async Task GeneratePDF(Order order) {
   var document = ...  //PDF generated here, takes about 10 seconds
   order.PDF = document ;
}

public async Task GenerateAllPDFs() {
   var orderIds = await _context.Orders.ToListAsync();
   var tasks = orderIds.Select((order) => GeneratePDF(order).ContinueWith(t => Console.WriteLine(t.Exception), TaskContinuationOptions.OnlyOnFaulted));
   await Task.WhenAll(tasks);
   await _context.SaveChangesAsync();
}

Here is my suggestion from the comment as an answer. I would split it in 3 parts:

1) get all orders,

2) then do a Parallel.Foreach to generate all documents in parallel. and assign each document to the proper order and in the end

3) do a single _context.SaveChangesAsync(); to make a bulk update on the data on the server

public async Task GenerateAllPDFs()
{
    var allOrders = await _context.Orders.ToListAsync();
    System.Threading.Tasks.Parallel.ForEach(allOrders, order => 
    {
        var document = ...  //PDF generated here, takes about 10 seconds
        order.PDF = document ;
    });
    await _context.SaveChangesAsync();
}

You need to implement parallel programing.

https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/task-based-asynchronous-programming

public class Example
{
   public static void Main()
   {
      Task[] taskArray = new Task[10];
      for (int i = 0; i < taskArray.Length; i++) {
         taskArray[i] = Task.Factory.StartNew( (Object obj ) => {
                                                  CustomData data = obj as CustomData;
                                                  if (data == null) 
                                                     return;

                                                  data.ThreadNum = Thread.CurrentThread.ManagedThreadId;
                                               },
                                               new CustomData() {Name = i, CreationTime = DateTime.Now.Ticks} );
      }
      Task.WaitAll(taskArray);     
      foreach (var task in taskArray) {
         var data = task.AsyncState as CustomData;
         if (data != null)
            Console.WriteLine("Task #{0} created at {1}, ran on thread #{2}.",
                              data.Name, data.CreationTime, data.ThreadNum);
      }                     
   }
}

I think I will have to "duplicate" the GeneratePDF method to facilitate the batch processing by implementing the other answers, since I need this method also in non-batch mode...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM