简体   繁体   English

如何在C#中使用线程读取和解析非常大的平面文件?

[英]How can I read and parse very large flat file using thread in C#?

I have to read a large text file and to parse it line by line using C#. 我必须读取一个大文本文件并使用C#逐行解析它。 It could be done easily with StreamReader for small sized file but it caught out of memory exception while working with large file. 对于小型文件,可以使用StreamReader轻松完成,但在处理大型文件时会出现内存不足的问题。 How can I adapt it for large files? 如何使其适应大文件?

Following code catches OutOfMemoryException : 以下代码捕获OutOfMemoryException

using (StreamReader reader = new StreamReader(FileNameWithPath))
{
    while ((line = reader.ReadLine()) != null)
    {
        // Do something here...
    }
}

That is pretty much the standard code for a lazy line reader, and shouldn't cause an OutOfMemoryException unless there are some really big single lines. 这几乎是惰性读取器的标准代码,除非有一些非常大的单行,否则不应该导致OutOfMemoryException You could also try: 你也可以尝试:

foreach(var line in File.ReadLines(FileNameWithPath)) {
    // Do something here...
}

which just makes it cleaner, but does the same thing. 它只是使它更干净,但做同样的事情。 So there are two options: 所以有两种选择:

  1. one or more of the "lines" is simply huge 一条或多条“线”简直就是巨大的
  2. something in "Do something here" is slowly (or quickly) eating your memory “在这里做点什么”的东西慢慢地(或快速地)吃掉你的记忆

I expect the latter is more likley. 我希望后者更有可能。

I am not sure with this but give try to this class of .net framework 我不确定这个,但尝试这类.net框架

MemoryMappedFile Class -A memory-mapped file maps the contents of a file to an application's logical address space. MemoryMappedFile类 - 内存映射文件将文件内容映射到应用程序的逻辑地址空间。 Memory-mapped files enable programmers to work with extremely large files because memory can be managed concurrently, and they allow complete, random access to a file without the need for seeking. 内存映射文件使程序员能够处理非常大的文件,因为可以同时管理内存,并且它们允许完全随机访问文件而无需搜索。 Memory-mapped files can also be shared across multiple processes. 内存映射文件也可以跨多个进程共享。

using (var inputFile = new System.IO.StreamReader(sourceFilePath))
{
    while (inputFile.Peek() >= 0) {
        string lineData = inputFile.ReadLine();

        // Do something with lineData
    }
}

How about specify the buffer size ? 如何指定缓冲区大小?

like this. 像这样。

using (var reader = new StreamWriter(path,false,Encoding.UTF8, 1000))
{
    .....

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM