简体   繁体   English

如何从c#中的文本文件中获取某些行?

[英]How to obtain certain lines from a text file in c#?

I'm working in C# and i got a large text file (75MB) I want to save lines that match a regular expression 我在C#工作,我有一个大文本文件(75MB)我想保存与正则表达式匹配的行

I tried reading the file with a streamreader and ReadToEnd, but it takes 400MB of ram 我尝试使用streamreader和ReadToEnd读取文件,但它需要400MB的内存

and when used again creates an out of memory exception. 并在再次使用时创建内存不足异常。

I then tried using File.ReadAllLines(): 然后我尝试使用File.ReadAllLines():

string[] lines = File.ReadAllLines("file");

StringBuilder specialLines = new StringBuilder();


foreach (string line in lines)

 if (match reg exp)

  specialLines.append(line);

this is all great but when my function ends the memory taken doesnt clear and I'm left with 300MB of used memory, only when recalling the function and executing the line: string[] lines = File.ReadAllLines("file"); 这一切都很棒,但是当我的函数结束时,所采用的内存并没有清除,我留下300MB的已用内存,只有在调用函数并执行行时:string [] lines = File.ReadAllLines(“file”); I see the memory clearing down to 50MB give or take and then reallocating back to 200MB 我看到内存清理为50MB给予或接受然后重新分配回200MB

How can I clear this memory or get the lines I need in a different way ? 如何清除此内存或以不同的方式获取我需要的行?

        var file = File.OpenRead("myfile.txt");
        var reader = new StreamReader(file);
        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();
            //evaluate the line here.
        }
        reader.Dispose();
        file.Dispose();

You need to stream the text instead of loading the whole file in memory. 您需要流式传输文本,而不是将整个文件加载到内存中。 Here's a way to do it, using an extension method and Linq: 这是一种方法,使用扩展方法和Linq:

static class ExtensionMethods
{
    public static IEnumerable<string> EnumerateLines(this TextReader reader)
    {
        string line;
        while((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

...

var regex = new Regex(..., RegexOptions.Compiled);
using (var reader = new StreamReader(fileName))
{
    var specialLines =
        reader.EnumerateLines()
              .Where(line => regex.IsMatch(line))
              .Aggregate(new StringBuilder(),
                         (sb, line) => sb.AppendLine(line));
}

您可以使用StreamReader#ReadLine逐行读取文件并保存所需的那些行。

您应该使用Enumerator模式来保持较低的内存占用,以防您的文件很大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM