简体   繁体   English

使 I/O 绑定操作异步

[英]Making I/O bound operation asynchronous

I am developing a web application that will read Excel files and make some validation.我正在开发一个 Web 应用程序,它将读取 Excel 文件并进行一些验证。

I use ExcelDataReader library, which lacks async methods.我使用ExcelDataReader库,它缺少异步方法。

Read() method takes care of reading the next rows in a sheet, so it is mostly I/O bound (it just advances the stream to a certain position, so data from the row is available). Read()方法负责读取工作表中的下一行,因此它主要受 I/O 限制(它只是将流推进到某个位置,因此行中的数据可用)。

To increase my throughput, I wrapped it in Task.Run , like:为了增加吞吐量,我将其包装在Task.Run中,例如:

await Task.Run(() => excelDataReader.Read());

From benchmarks I can see it executes faster (or at least no longer) than regular synchronous call excelDataReader.Read()从基准测试中我可以看到它比常规同步调用excelDataReader.Read()

From posts I read I see Task.Run is recommended only in UI scenarios, when we need to unblock the UI thread.从我阅读的帖子中,我看到Task.Run仅在 UI 场景中被推荐,当我们需要解除对 UI 线程的阻塞时。

So, the question is does it make sense to wrap I/O bound operations in Task.Run ?那么,问题是在Task.Run中包装 I/O 绑定操作是否有意义?

Or am I missing something?还是我错过了什么?

EDIT:编辑:

benchmark results基准测试结果

在此处输入图像描述

So the second question would be - if my approach is just wrong, why it performs better?所以第二个问题是 - 如果我的方法是错误的,为什么它表现更好?

ANOTHER EDIT:另一个编辑:

Below is benchmark code that showed that asynchronous version is much faster:以下是显示异步版本更快的基准代码:

using BenchmarkDotNet.Attributes;
using ExcelDataReader;
using System.Text;

namespace ExcelDataReaderTests;

public class ExcelDataReaderTester
{
    private const string DirectoryWithManyBigExcelFiles = @"Directory With Many Big Excel Files";

    [Benchmark]
    public async Task TestAsync()
    {
        Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

        var directory = Directory.GetFiles(DirectoryWithManyBigExcelFiles);

        var tasks = new List<Task>();
        foreach (var file in directory)
        {
            tasks.Add(UseExcelDataReadAsync(file));
        }
        await Task.WhenAll(tasks);
    }

    [Benchmark]
    public void Test()
    {
        Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
        var directory = Directory.GetFiles(DirectoryWithManyBigExcelFiles);

        foreach (var file in directory)
        {
            UseExcelDataRead(file);
        }
    }

    private static void UseExcelDataRead(string filePath)
    {
        var fileStream = File.OpenRead(filePath);
        var excelDataReader = ExcelReaderFactory.CreateReader(fileStream);

        while (excelDataReader.Read())
        {
            var x = excelDataReader.GetFieldType(0);
        }
    }

    private static async Task UseExcelDataReadAsync(string filePath)
    {
        var fileStream = File.OpenRead(filePath);
        var excelDataReader = ExcelReaderFactory.CreateReader(fileStream);

        while (await Task.Run(() => excelDataReader.Read()))
        {
            var x = excelDataReader.GetFieldType(0);
        }
    }
}

And the benchmark are run with:基准测试运行:

BenchmarkRunner.Run(typeof(ExcelDataReaderTester).Assembly);

I am using BechmarkDotNet for benchmarking.我正在使用BechmarkDotNet进行基准测试。

ExcelDataReader loads all data in memory so it makes no sense to provide asynchronous methods. ExcelDataReader 将所有数据加载到内存中,因此提供异步方法毫无意义。 The xlsx format is a ZIP package containing XML files which can't be read line by line. xlsx格式是一个 ZIP 包,其中包含无法逐行读取的 XML 文件。 There's no IO involved in ExcelDataReader.Read . ExcelDataReader.Read中不涉及 IO。

The real IO is performed by CreateReader .真正的 IO 由CreateReader执行。 Unfortunately there's no CreateReaderAsync and no plan to add this yet不幸的是,没有CreateReaderAsync也没有计划添加它

In general, pushing an IO operation to another thread won't make it faster.通常,将 IO 操作推送到另一个线程不会使其更快。

In a desktop application, if you want to process the contents of an Excel file without blocking the UI thread, put the entire processing code into a method and call it with Task.Run .在桌面应用程序中,如果您想在不阻塞 UI 线程的情况下处理 Excel 文件的内容,请将整个处理代码放入一个方法中并使用Task.Run调用它。 For more complex processing you can use eg DataFlow or Channels to execute multiple processing steps in the background.对于更复杂的处理,您可以使用 DataFlow 或 Channels 在后台执行多个处理步骤。

In a web application though, each request is server by a different ThreadPool thread.但是,在 Web 应用程序中,每个请求都是由不同的 ThreadPool 线程提供的服务器。 There's no UI thread to block.没有要阻止的 UI 线程。 await Task.Run(...) simply moves execution from one ThreadPool thread to another. await Task.Run(...)只是将执行从一个 ThreadPool 线程移动到另一个线程。

You technically do multithreaded programming by wrapping the synchronous code with Task.Run .从技术上讲,您可以通过使用Task.Run包装同步代码来进行多线程编程

await Task.Run(() => excelDataReader.Read());

But you won't do asynchronous programming with this.但是你不会用这个进行异步编程 It may look like you do, but you actually don't because what's inside is not an I/O operation.它可能看起来像你这样做,但实际上你没有,因为里面不是I/O操作。

The excelDataReader.Read() will be simply picked up by a threadpool thread, and the thread will synchronously run the action, which means it is going to be blocked . excelDataReader.Read()将被线程池线程简单地拾取,并且线程将同步运行该操作,这意味着它将被阻塞 If you receive many requests enough to saturate all your threadpool threads, what you're going to suffer is Thread Starvation .如果您收到的请求足以使所有线程池线程饱和,那么您将遭受Thread Starvation Your server won't be responsive in the end.您的服务器最终不会响应。

And I want to add two comments on this statement.我想对这个声明添加两点评论。 From posts I read I see Task.Run is recommended only in UI scenarios, when we need to unblock the UI thread.从我阅读的帖子中,我看到 Task.Run 仅在 UI 场景中被推荐,当我们需要解除对 UI 线程的阻塞时。

  1. The general purpose of Task.Run is not for unblocking the UI thread. Task.Run的一般用途不是解除对 UI 线程的阻塞。 It is recommended for CPU Bound job or something you can fire&forget.建议用于CPU Bound作业或您可以触发&忘记的东西。
  2. Where I/O job involved, you can avoid blocking the UI thread by introducing async/await .在涉及I/O作业的情况下,您可以通过引入async/await来避免阻塞 UI 线程。

You need to update your code that outputs the benchmark result.您需要更新输出基准测试结果的代码。 Async alone doesn't increase the performance, instead it increases thread availability.单独的异步不会提高性能,而是提高线程可用性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM