StreamReader如何讀取所有字符，包括0x0D 0x0A字符？

Question

StreamReader如何讀取所有字符，包括0x0D 0x0A字符？

我有一個試圖隱藏的舊.txt文件。 許多行（但不是全部）以“ 0x0D 0x0D 0x0A”結尾。

此代碼讀取所有行。

StreamReader srFile = new StreamReader(gstPathFileName);
while (!srFile.EndOfStream) {
    string stFileContents = srFile.ReadLine();
    ...
}

這會在每個.txt行之間產生額外的“”字符串。 由於段落之間有一些空白行，因此刪除所有“”字符串會刪除這些空白行。

有沒有辦法讓StreamReader讀取所有字符，包括“ 0x0D 0x0D 0x0A”？

兩個小時后進行了編輯...文件很大，為1.6MB。

Answer 1

ReadLine非常簡單的重新實現。 我已經完成了返回IEnumerable<string>的版本，因為它更容易。 我把它放在擴展方法中，所以放在static class 。 該代碼帶有大量注釋，因此應易於閱讀。

public static class StreamEx
{
    public static string[] ReadAllLines(this TextReader tr, string separator)
    {
        return tr.ReadLines(separator).ToArray();
    }

    // StreamReader is based on TextReader
    public static IEnumerable<string> ReadLines(this TextReader tr, string separator)
    {
        // Handling of empty file: old remains null
        string old = null;

        // Read buffer
        var buffer = new char[128];

        while (true)
        {
            // If we already read something
            if (old != null)
            {
                // Look for the separator
                int ix = old.IndexOf(separator);

                // If found
                if (ix != -1)
                {
                    // Return the piece of line before the separator
                    yield return old.Remove(ix);

                    // Then remove the piece of line before the separator plus the separator
                    old = old.Substring(ix + separator.Length);

                    // And continue 
                    continue;
                }
            }

            // old doesn't contain any separator, let's read some more chars
            int read = tr.ReadBlock(buffer, 0, buffer.Length);

            // If there is no more chars to read, break the cycle
            if (read == 0)
            {
                break;
            }

            // Add the just read chars to the old chars
            // note that null + "somestring" == "somestring"
            old += new string(buffer, 0, read);

            // A new "round" of the while cycle will search for the separator
        }

        // Now we have to handle chars after the last separator

        // If we read something
        if (old != null)
        {
            // Return all the remaining characters
            yield return old;
        }
    }
}

請注意，按照書面規定，它不會直接處理您的問題:-)但是，它使您可以選擇要使用的分隔符。 因此，您使用"\\r\\n" ，然后修剪多余的'\\r' 。

像這樣使用它：

using (var sr = new StreamReader("somefile"))
{
    // Little LINQ to strip excess \r and to make an array
    // (note that by making an array you'll put all the file
    // in memory)
    string[] lines = sr.ReadLines("\r\n").Select(x => x.TrimEnd('\r')).ToArray();
}

要么

using (var sr = new StreamReader("somefile"))
{
    // Little LINQ to strip excess \r
    // (note that the file will be read line by line, so only
    // a line at a time is in memory (plus some remaining characters
    // of the next line in the old buffer)
    IEnumerable<string> lines = sr.ReadLines("\r\n").Select(x => x.TrimEnd('\r'));

    foreach (string line in lines)
    {
        // Do something
    }
}

Answer 2

您始終可以使用BinaryReader並一次手動地BinaryReader讀取一個字節。 保留字節，然后在遇到0x0d 0x0d 0x0a ，為當前行創建一個新的字節字符串。

注意：

我假設您的編碼為Encoding.UTF8但您的情況可能有所不同。 直接訪問字節，我不知道如何立即理解編碼。
如果您的文件包含其他信息，例如字節順序標記，也將返回該信息。

這里是：

public static IEnumerable<string> ReadLinesFromStream(string fileName)
{
    using ( var fileStream = File.Open(gstPathFileName) )
    using ( BinaryReader binaryReader = new BinaryReader(fileStream) )
    {
        var bytes = new List<byte>();
        while ( binaryReader.PeekChar() != -1 )
        {
            bytes.Add(binaryReader.ReadByte());

            bool newLine = bytes.Count > 2
                && bytes[bytes.Count - 3] == 0x0d
                && bytes[bytes.Count - 2] == 0x0d
                && bytes[bytes.Count - 1] == 0x0a;

            if ( newLine )
            {
                yield return Encoding.UTF8.GetString(bytes.Take(bytes.Count - 3).ToArray());
                bytes.Clear();
            }
        }

        if ( bytes.Count > 0 )
            yield return Encoding.UTF8.GetString(bytes.ToArray());
    }
}

Answer 3

這段代碼很好用...讀取每個字符。

char[] acBuf = null;
int iReadLength = 100;
while (srFile.Peek() >= 0) {
    acBuf = new char[iReadLength];
    srFile.Read(acBuf, 0, iReadLength);
    string s = new string(acBuf);
}

Answer 4

一個非常簡單的解決方案（未針對內存消耗進行優化）可能是：

var allLines = File.ReadAllText(gstPathFileName)
    .Split('\n');

如果您需要刪除尾隨回車符，請執行以下操作：

for(var i = 0; i < allLines.Length; ++i)
    allLines[i] = allLines[i].TrimEnd('\r');

您可以根據需要for鏈接中for相關處理。 或者，如果您不想保留數組，請使用它代替for ：

foreach(var line in allLines.Select(x => x.TrimEnd('\r')))
{
    // use 'line' here ...
}

StreamReader如何讀取所有字符，包括0x0D 0x0A字符？

問題描述

4 個解決方案

解決方案1
1 2015-03-01 07:13:04

解決方案2
0 2015-02-28 20:25:58

解決方案3
0 已采納 2015-02-28 20:31:35

解決方案4
0 2015-03-01 07:54:03

StreamReader如何讀取所有字符，包括0x0D 0x0A字符？

問題描述

4 個解決方案

解決方案1 1 2015-03-01 07:13:04

解決方案2 0 2015-02-28 20:25:58

解決方案3 0 已采納 2015-02-28 20:31:35

解決方案4 0 2015-03-01 07:54:03

解決方案1
1 2015-03-01 07:13:04

解決方案2
0 2015-02-28 20:25:58

解決方案3
0 已采納 2015-02-28 20:31:35

解決方案4
0 2015-03-01 07:54:03