StreamReader.ReadLine（）行为很奇怪

Question

I have a delimited file with a few thousand lines in it, and I wrote a method to automatically detect the delimiter. 我有一个带有数千行的定界文件，并且编写了一种自动检测定界符的方法。

The method looks like this: 该方法如下所示：

private bool TryDetermineDelimiter(FileInfo target, out char delimiter)
        {
            char[] possibleDelimiters = new char[] { ',', ';', '-', ':' };

            using (StreamReader sr = new StreamReader(target.OpenRead()))
            {
                List<int> delimiterHits = new List<int>();

                foreach (char del in possibleDelimiters)
                {


                    while (!sr.EndOfStream)
                    {
                        var line = sr.ReadLine();
                        var matches = Regex.Matches(line, $"{del}(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

                        if(matches.Count == 0)
                        {
                            sr.BaseStream.Seek(0, SeekOrigin.Begin);
                            break;
                        }

                        delimiterHits.Add(matches.Count);
                    }

                    if (delimiterHits.Any(d => d != delimiterHits[0]) || delimiterHits.Count == 0)
                    {
                        delimiterHits.Clear();
                        continue;
                    }

                    delimiter = del;
                    return true;
                }
            }

            delimiter = ',';
            return false;
        }

There is a strange thing happening, where at the 5th line, the call to sr.ReadLine() is returning the 5th line with the 1st line concatenated 发生了一件奇怪的事情，在第5行，对sr.ReadLine()的调用返回了第1行串联的第5行

So for example: 因此，例如：

delimited file: 分隔文件：

col1; col2; col3; col4
val1; val2; val3; val4
val5; val6; val7; val8
...

The first 4 calls to StreamReader.ReadLine() return the expected lines but the 5th call returns: val13; val14; val15; val16; col1; col2; col3; col4; 对StreamReader.ReadLine()的前4个调用返回预期的行，但第5个调用返回： val13; val14; val15; val16; col1; col2; col3; col4; val13; val14; val15; val16; col1; col2; col3; col4;

Stepping through, I can confirm that the loop never enters the if(matches.Count == 0) block, the correct number of delimiters is found each iteration. 逐步执行，我可以确认循环永远不会进入if(matches.Count == 0)块，每次迭代都会找到正确数量的定界符。

Unfortunately I can't post the contents of the actual file because it may get me in trouble, but I have ensured there is no fishy business with the line endings or other characters. 不幸的是我不能发布的实际文件的内容，因为它可以让我麻烦，但我已确保没有可疑的业务与行尾或其它字符。 The file is as expected. 该文件是预期的。

I should also mention that this bug does not occur with comma separated values, only with semicolons . 我还应该提到，此错误不会出现在用逗号分隔的值上，而只会出现在分号上 。

Answer 1

Change your code to this 将您的代码更改为此

if (matches.Count == 0)
{
    sr.BaseStream.Seek(0, SeekOrigin.Begin);
    sr.DiscardBufferedData();
    break;
}

By instructing the StreamReader to discard its buffer, you're instructing it to synchronize with the actual base stream. 通过指示StreamReader丢弃其缓冲区，即指示其与实际基本流进行同步。

Other than that, the lines returned aren't concatenated, but it is looping back on its self, though what I've shown above will fix that 除此之外，返回的行没有被连接，但是它循环返回自己，尽管我上面显示的内容可以解决该问题

StreamReader.ReadLine（）行为很奇怪

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-09-29 00:00:38

StreamReader.ReadLine（）行为很奇怪

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-09-29 00:00:38

解决方案1
2 已采纳 2018-09-29 00:00:38