简体   繁体   中英

How to write 1GB file in efficient way C#

I have .txt file (contains more than million rows) which is around 1GB and I have one list of string, I am trying to remove all the rows from the file that exist in the list of strings and creating new file but it is taking long long time.

using (StreamReader reader = new StreamReader(_inputFileName))
{
   using (StreamWriter writer = new StreamWriter(_outputFileName))
   {
     string line;
     while ((line = reader.ReadLine()) != null)
     {
       if (!_lstLineToRemove.Contains(line))
              writer.WriteLine(line);
     }

    }
  }

How can I enhance the performance of my code?

You may get some speedup by using PLINQ to do the work in parallel, also switching from a list to a hash set will also greatly speed up the Contains( check. HashSet is thread safe for read-only operations.

private HashSet<string> _hshLineToRemove;

void ProcessFiles()
{
    var inputLines = File.ReadLines(_inputFileName);
    var filteredInputLines = inputLines.AsParallel().AsOrdered().Where(line => !_hshLineToRemove.Contains(line));
    File.WriteAllLines(_outputFileName, filteredInputLines);
}

If it does not matter that the output file be in the same order as the input file you can remove the .AsOrdered() and get some additional speed.

Beyond this you are really just I/O bound, the only way to make it any faster is to get faster drives to run it on.

The code is particularly slow because the reader and writer never execute in parallel. Each has to wait for the other.

You can almost double the speed of file operations like this by having a reader thread and a writer thread. Put a BlockingCollection between them so you can communicate between the threads and limit how many rows you buffer in memory.

If the computation is really expensive (it isn't in your case), a third thread with another BlockingCollection doing the processing can help too.

Do not use buffered text routines. Use binary, unbuffered library routines and make your buffer size as big as possible. That's how to make it the fastest.

Have you considered using AWK

AWK is a very powerfull tool to process text files, you can find more information about how to filter lines that match a certain criteria Filter text with ASK

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM