简体   繁体   English

字符串数组为大型多行条目引发OutOfMemoryException

[英]String array throws OutOfMemoryException for large multi-line entries

In a Windows Forms C# app, I have a textbox where users paste log data, and it sorts it. 在Windows Forms C#应用程序中,我有一个文本框,用户可以在其中粘贴日志数据并对其进行排序。 I need to check each line individualy so I split the input by the new line, but if there are a lot of lines, greater than 100,000 or so, it throws a OutOfMemoryException. 我需要单独检查每行,因此我将输入按新行划分,但是如果有很多行(大于100,000个左右),则会抛出OutOfMemoryException。

My code looks like this: 我的代码如下所示:

StringSplitOptions splitOptions = new StringSplitOptions();
if(removeEmptyLines_CB.Checked)
    splitOptions = StringSplitOptions.RemoveEmptyEntries;
else
    splitOptions = StringSplitOptions.None;

List<string> outputLines = new List<string>();

foreach(string line in input_TB.Text.Split(new string[] { "\r\n", "\n" }, splitOptions))
{
    if(line.Contains(inputCompare_TB.Text))
        outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);

The problem comes from when I split the textbox text by line, here input_TB.Text.Split(new string[] { "\\r\\n", "\\n" } 问题出在我按行分割文本框文本时,这里是input_TB.Text.Split(new string[] { "\\r\\n", "\\n" }

Is there a better way to do this? 有一个更好的方法吗? I've thought about taking the first X amount of text, truncating at a new line and repeat until everything has been read, but this seems tedious. 我已经考虑过要获取前X个文本,在新行中截断并重复直到所有内容都被阅读为止,但这似乎很乏味。 Or is there a way to allocate more memory for it? 还是有办法为其分配更多的内存?

Thanks, Garrett 谢谢加勒特

Update 更新

Thanks to Attila, I came up with this and it seems to work. 多亏了Attila,我才想到了这一点,而且似乎可行。 Thanks 谢谢

StringReader reader = new StringReader(input_TB.Text);
string line;
while((line = reader.ReadLine()) != null)
{
    if(line.Contains(inputCompare_TB.Text))
        outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);

Split will have to duplicate the memory need of the original text, plus overhead of string objects for each line. Split将需要复制原始文本的内存需求,再加上每行的string对象的开销。 If this causes memory issues, a reliable way of processing the input is to parse one line at a time. 如果这引起内存问题,则处理输入的可靠方法是一次解析一行。

The better way to do this would be to extract and process one line at a time, and use a StringBuilder to create the result: 更好的方法是一次提取并处理一行,并使用StringBuilder创建结果:

StringBuilder outputTxt = new StringBuilder();
string txt = input_TB.Text;
int txtIndex = 0;
while (txtIndex < txt.Length) {
  int startLineIndex = txtIndex;
GetMore:
  while (txtIndex < txt.Length && txt[txtIndex] != '\r'  && txt[txtIndex] != '\n')) {
    txtIndex++;
  }
  if (txtIndex < txt.Length && txt[txtIndex] == '\r' && (txtIndex == txt.Length-1 || txt[txtIndex+1] != '\n') {
    txtIndex++;
    goto GetMore; 
  }
  string line = txt.Substring(startLineIndex, txtIndex-startLineIndex);
  if (line.Contains(inputCompare_TB.Text)) {
    if (outputTxt.Length > 0)
      outputTxt.Append(Environment.NewLine);
    outputTxt.Append(line); 
  }
  txtIndex++;
} 
output_TB.Text = outputTxt.ToString(); 

Pre-emptive comment: someone will object to the goto - but it is what's needed here, the alternatives are much more complex (reg exp for example), or fake the goto with another loop and continue or break 先发制人的评论:有人会反对goto但这是这里所需要的,替代方案要复杂得多(例如,reg exp),或者使用另一个循环伪造goto并continuebreak

Using a StringReader to split the lines is a much cleaner solution , but it does not handle both \\r\\n and \\n as a new line : 使用StringReader拆分行是一种更干净的解决方案 ,但是它不能将 \\r\\n\\n当作新行来处理

StringReader reader = new StringReader(input_TB.Text); 
StringBuilder outputTxt = new StringBuilder();
string compareTxt = inputCompare_TB.Text;
string line; 
while((line = reader.ReadLine()) != null) { 
  if (line.Contains(compareTxt)) {
    if (outputTxt.Length > 0)
      outputTxt.Append(Environment.NewLine);
    outputTxt.Append(line); 
  }
} 
output_TB.Text = outputTxt.ToString(); 

I guess the only way to do this on large text files is to open the file manually and use a StreamReader . 我猜想对大型文本文件执行此操作的唯一方法是手动打开文件并使用StreamReader Here is an example how to do this. 是一个示例如何执行此操作。

You can avoid creating strings for all lines and the array by creating the string for each line one at a time: 通过一次为每一行创建一个字符串,可以避免为所有行和数组创建字符串:

var eol = new[] { '\r', '\n' };

var pos = 0;
while (pos < input.Length)
{
    var i = input.IndexOfAny(eol, pos);
    if (i < 0)
    {
        i = input.Length;
    }
    if (i != pos)
    {
        var line = input.Substring(pos, i - pos);

        // process line
    }
    pos = i + 1;
}

On other hand, In this article say that the point is that "split" method is implemented poorly. 另一方面, 在本文中说的重点是“拆分”方法的实施不佳。 Read it, and make your conclusions. 阅读并得出结论。

Like Attila said, you have to parse line by line. 就像Attila所说的,您必须逐行解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM