简体   繁体   中英

How to handle large numbers of concurrent disk write requests as efficiently as possible

Say the method below is being called several thousand times by different threads in a .net 4 application. What's the best way to handle this situation? Understand that the disk is the bottleneck here but I'd like the WriteFile() method to return quickly.

Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?

public void WriteFile(string FileName, MemoryStream Data)
{
   try
   {
      using (FileStream DiskFile = File.OpenWrite(FileName))
      {
         Data.WriteTo(DiskFile);
         DiskFile.Flush();
         DiskFile.Close();
      }
   }
   catch (Exception e)
   {
      Console.WriteLine(e.Message);
   }
}

If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory Queue where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatching Queue and writing files. If your WriteFile is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.

UPDATE: Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.

Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.

在此输入图像描述

Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a Task :

private void WriteFileSynchronous(string FileName, MemoryStream Data)
{
    Task.Factory.StartNew(() => WriteFileSynchronously(FileName, Data));
}

private void WriteFileSynchronous(string FileName, MemoryStream Data)
{
    try
    {
        using (FileStream DiskFile = File.OpenWrite(FileName))
        {
            Data.WriteTo(DiskFile);
            DiskFile.Flush();
            DiskFile.Close();
        }
    }

    catch (Exception e)
    {
        Console.WriteLine(e.Message);
    }
}

The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.

If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has WriteFile just throwing stuff into a ConcurrentQueue or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.

I've had a similar problem that I solved by having the WriteFile method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.

A second thread in my program starts reading that file that WriteFile is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.

This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.

It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.

Encapsulate your complete method implementation in a new Thread() . Then you can "fire-and-forget" these threads and return to the main calling thread.

    foreach (file in filesArray)
    {
        try
        {
            System.Threading.Thread updateThread = new System.Threading.Thread(delegate()
                {
                    WriteFileSynchronous(fileName, data);
                });
            updateThread.Start();
        }
        catch (Exception ex)
        {
            string errMsg = ex.Message;
            Exception innerEx = ex.InnerException;
            while (innerEx != null)
            {
                errMsg += "\n" + innerEx.Message;
                innerEx = innerEx.InnerException;
            }
            errorMessages.Add(errMsg);
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM