简体   繁体   中英

Read a very large files in parallel C#

I have more than 20 files, each of them contain almost 1 million lines (5 Gigabyte), I need to speed up the reading process, so I'm trying to read those files in parallel, but it takes longer time than reading them sequentially. is there any way to read a very large files in parallel?

 Parallel.ForEach(sourceFilesList, filePath =>
 {
     if (!string.IsNullOrEmpty(filePath) && File.Exists(filePath))
     {
          StreamReader str = new StreamReader(filePath);
          while (!str.EndOfStream)
          {
              var temporaryObj = new object();
              string line = str.ReadLine();
              // process line here 
          }
     }
});

Its better to use buffered reader for huge files. something like this will help.

using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, 
FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {

    }
}

Why BufferedStream is faster

A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance. A buffer can be used for either reading or writing, but never both simultaneously. The Read and Write methods of BufferedStream automatically maintain the buffer.

Its IO operation , suggestion is to make use of Async/Await like as below (mostly make use of ReadAsync function which helps to do read it asynchronous), Async/Await makes use of you Machine Physical Core 's efficiently.

public void ReadFiles()
{
  List<string> paths = new List<string>(){"path1", "path2", "path3"};
  foreach(string path in Paths)
  {
      await ProcessRead(path);
  }
}

public async void ProcessRead(filePath)
{
    if (File.Exists(filePath) == false)
    {
        Debug.WriteLine("file not found: " + filePath);
    }
    else
    {
        try
        {
            string text = await ReadTextAsync(filePath);
            Debug.WriteLine(text);
        }
        catch (Exception ex)
        {
            Debug.WriteLine(ex.Message);
        }
    }
}

private async Task<string> ReadTextAsync(string filePath)
{
    using (FileStream sourceStream = new FileStream(filePath,
        FileMode.Open, FileAccess.Read, FileShare.Read,
        bufferSize: 4096, useAsync: true))
    {
        StringBuilder sb = new StringBuilder();

        byte[] buffer = new byte[0x1000];
        int numRead;
        while ((numRead = await sourceStream.ReadAsync(buffer, 0, buffer.Length)) != 0)
        {
            string text = Encoding.Unicode.GetString(buffer, 0, numRead);
            sb.Append(text);
        }

        return sb.ToString();
    }
}

Code is taken from MSDN : Using Async for File Access (C# and Visual Basic)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM