c# - StreamReader 和尋求

Question

您可以使用StreamReader讀取普通文本文件，然后在讀取過程中保存當前位置后關閉StreamReader ，然后再次打開StreamReader並從該位置開始讀取嗎？

如果不是，我還可以使用什么來完成相同的案例而不鎖定文件？

我試過這個但它不起作用：

var fs = File.Open(@ "C:\testfile.txt", FileMode.Open, FileAccess.Read);
var sr = new StreamReader(fs);

Debug.WriteLine(sr.ReadLine()); //Prints:firstline

var pos = fs.Position;

while (!sr.EndOfStream) 
{
    Debug.WriteLine(sr.ReadLine());
}

fs.Seek(pos, SeekOrigin.Begin);

Debug.WriteLine(sr.ReadLine());
//Prints Nothing, i expect it to print SecondLine.

這是我也嘗試過的其他代碼：

var position = -1;
StreamReaderSE sr = new StreamReaderSE(@ "c:\testfile.txt");

Debug.WriteLine(sr.ReadLine());
position = sr.BytesRead;

Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());

Debug.WriteLine("Wait");

sr.BaseStream.Seek(position, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());

Answer 1

我意識到這真的很晚，但我自己偶然發現了StreamReader這個令人難以置信的缺陷； 使用StreamReader時無法可靠地查找的事實。 就我個人而言，我的具體需求是具有讀取字符的能力，但如果滿足某個條件則“備份”； 這是我正在解析的一種文件格式的副作用。

使用ReadLine()不是一個選項，因為它只在真正瑣碎的解析作業中有用。 我必須支持可配置的記錄/行分隔符序列並支持轉義分隔符序列。 另外，我不想實現自己的緩沖區，所以我可以支持“備份”和轉義序列； 那應該是StreamReader的工作。

此方法按需計算底層字節流中的實際位置。 它適用於 UTF8、UTF-16LE、UTF-16BE、UTF-32LE、UTF-32BE 和任何單字節編碼（例如代碼頁 1252、437、28591 等），無論是否存在序言/BOM。 此版本不適用於 UTF-7、Shift-JIS 或其他可變字節編碼。

當我需要尋找底層流中的任意位置時，我直接設置BaseStream.Position然后調用DiscardBufferedData()以使StreamReader重新同步以進行下一個Read() / Peek()調用。

並BaseStream.Position提醒：不要隨意設置BaseStream.Position 。 如果您將一個字符一分為二，那么下一個Read()就會無效，並且對於 UTF-16/-32，您也會使該方法的結果無效。

public static long GetActualPosition(StreamReader reader)
{
    System.Reflection.BindingFlags flags = System.Reflection.BindingFlags.DeclaredOnly | System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.GetField;

    // The current buffer of decoded characters
    char[] charBuffer = (char[])reader.GetType().InvokeMember("charBuffer", flags, null, reader, null);

    // The index of the next char to be read from charBuffer
    int charPos = (int)reader.GetType().InvokeMember("charPos", flags, null, reader, null);

    // The number of decoded chars presently used in charBuffer
    int charLen = (int)reader.GetType().InvokeMember("charLen", flags, null, reader, null);

    // The current buffer of read bytes (byteBuffer.Length = 1024; this is critical).
    byte[] byteBuffer = (byte[])reader.GetType().InvokeMember("byteBuffer", flags, null, reader, null);

    // The number of bytes read while advancing reader.BaseStream.Position to (re)fill charBuffer
    int byteLen = (int)reader.GetType().InvokeMember("byteLen", flags, null, reader, null);

    // The number of bytes the remaining chars use in the original encoding.
    int numBytesLeft = reader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);

    // For variable-byte encodings, deal with partial chars at the end of the buffer
    int numFragments = 0;
    if (byteLen > 0 && !reader.CurrentEncoding.IsSingleByte)
    {
        if (reader.CurrentEncoding.CodePage == 65001) // UTF-8
        {
            byte byteCountMask = 0;
            while ((byteBuffer[byteLen - numFragments - 1] >> 6) == 2) // if the byte is "10xx xxxx", it's a continuation-byte
                byteCountMask |= (byte)(1 << ++numFragments); // count bytes & build the "complete char" mask
            if ((byteBuffer[byteLen - numFragments - 1] >> 6) == 3) // if the byte is "11xx xxxx", it starts a multi-byte char.
                byteCountMask |= (byte)(1 << ++numFragments); // count bytes & build the "complete char" mask
            // see if we found as many bytes as the leading-byte says to expect
            if (numFragments > 1 && ((byteBuffer[byteLen - numFragments] >> 7 - numFragments) == byteCountMask))
                numFragments = 0; // no partial-char in the byte-buffer to account for
        }
        else if (reader.CurrentEncoding.CodePage == 1200) // UTF-16LE
        {
            if (byteBuffer[byteLen - 1] >= 0xd8) // high-surrogate
                numFragments = 2; // account for the partial character
        }
        else if (reader.CurrentEncoding.CodePage == 1201) // UTF-16BE
        {
            if (byteBuffer[byteLen - 2] >= 0xd8) // high-surrogate
                numFragments = 2; // account for the partial character
        }
    }
    return reader.BaseStream.Position - numBytesLeft - numFragments;
}

當然，這使用反射來獲取私有變量，因此存在風險。 但是，此方法適用於 .Net 2.0、3.0、3.5、4.0、4.0.3、4.5、4.5.1、4.5.2、4.6 和 4.6.1。 除了這種風險之外，唯一的另一個關鍵假設是底層字節緩沖區是一個byte[1024] ； 如果 Microsoft 以錯誤的方式更改它，則該方法會中斷 UTF-16/-32。

這已經針對填充Ažテ𣘺 （10 字節： 0x41 C5 BE E3 83 86 F0 A3 98 BA ）的 UTF-8 文件和填充A𐐷 （6 字節： 0x41 00 01 D8 37 DC ）的 UTF-16 文件進行A𐐷 . 重點是沿着byte[1024]邊界強制分割字符，它們可能是所有不同的方式。

更新（2013-07-03） ：我修復了該方法，該方法最初使用的是其他答案中的損壞代碼。 此版本已針對包含需要使用代理對的字符的數據進行了測試。 數據被放入 3 個文件中，每個文件都有不同的編碼； 一種 UTF-8、一種 UTF-16LE 和一種 UTF-16BE。

更新（2016-02） ：處理二等分字符的唯一正確方法是直接解釋底層字節。 正確處理 UTF-8，並且 UTF-16/-32 工作（考慮到 byteBuffer 的長度）。

Answer 2

是的，你可以，看這個：

var sr = new StreamReader("test.txt");
sr.BaseStream.Seek(2, SeekOrigin.Begin); // Check sr.BaseStream.CanSeek first

更新：請注意，您不一定可以將sr.BaseStream.Position用於任何有用的東西，因為StreamReader使用緩沖區，因此它不會反映您實際閱讀的內容。 我猜你在找到真正的位置時會遇到問題。 因為您不能只計算字符（不同的編碼以及字符長度）。 我認為最好的方法是使用FileStream本身。

更新：從這里使用TGREER.myStreamReader ： http : TGREER.myStreamReader這個類添加了BytesRead等（適用於ReadLine()但顯然不適用於其他讀取方法）和那么你可以這樣做：

File.WriteAllText("test.txt", "1234\n56789");

long position = -1;

using (var sr = new myStreamReader("test.txt"))
{
    Console.WriteLine(sr.ReadLine());

    position = sr.BytesRead;
}

Console.WriteLine("Wait");

using (var sr = new myStreamReader("test.txt"))
{
    sr.BaseStream.Seek(position, SeekOrigin.Begin);
    Console.WriteLine(sr.ReadToEnd());
}

Answer 3

如果您只想搜索文本流中的開始位置，我將這個擴展添加到 StreamReader 以便我可以確定應該在哪里編輯流。 當然，這是基於字符作為邏輯的遞增方面，但就我的目的而言，它非常有效，用於根據字符串模式獲取基於文本/ASCII 的文件中的位置。 然后，您可以使用該位置作為讀取的起點，編寫一個新文件，該文件排除了起點之前的數據。

流中返回的位置可以提供給 Seek，以從基於文本的流讀取中的該位置開始。 有用。 我已經測試過了。 但是，在匹配算法期間匹配到非 ASCII Unicode 字符時可能會出現問題。 這是基於美式英語和相關的字符頁面。

基礎知識：它逐個字符地掃描文本流，僅通過流向前查找序列字符串模式（與字符串參數匹配）。 一旦模式與字符串參數不匹配（即前進，逐個字符），它將重新開始（從當前位置）嘗試獲得一個逐個字符的匹配項。 如果在流中找不到匹配項，它將最終退出。 如果找到匹配項，則它返回流中當前的“字符”位置，而不是 StreamReader.BaseStream.Position，因為該位置在前面，基於 StreamReader 所做的緩沖。

如注釋中所述，此方法將影響 StreamReader 的位置，並將在方法結束時將其設置回開頭 (0)。 StreamReader.BaseStream.Seek 應該用於運行到此擴展返回的位置。

注意：此擴展返回的位置也適用於 BinaryReader.Seek 作為處理文本文件時的起始位置。 在丟棄 PJL 標頭信息以使文件成為可以被 GhostScript 使用的“正確”PostScript 可讀文件之后，我實際上為此目的使用了此邏輯將 PostScript 文件重寫回磁盤。 :)

要在 PostScript 中搜索的字符串（在 PJL 標頭之后）是：“%!PS-”，后跟“Adobe”和版本。

public static class StreamReaderExtension
{
    /// <summary>
    /// Searches from the beginning of the stream for the indicated
    /// <paramref name="pattern"/>. Once found, returns the position within the stream
    /// that the pattern begins at.
    /// </summary>
    /// <param name="pattern">The <c>string</c> pattern to search for in the stream.</param>
    /// <returns>If <paramref name="pattern"/> is found in the stream, then the start position
    /// within the stream of the pattern; otherwise, -1.</returns>
    /// <remarks>Please note: this method will change the current stream position of this instance of
    /// <see cref="System.IO.StreamReader"/>. When it completes, the position of the reader will
    /// be set to 0.</remarks>
    public static long FindSeekPosition(this StreamReader reader, string pattern)
    {
        if (!string.IsNullOrEmpty(pattern) && reader.BaseStream.CanSeek)
        {
            try
            {
                reader.BaseStream.Position = 0;
                reader.DiscardBufferedData();
                StringBuilder buff = new StringBuilder();
                long start = 0;
                long charCount = 0;
                List<char> matches = new List<char>(pattern.ToCharArray());
                bool startFound = false;

                while (!reader.EndOfStream)
                {
                    char chr = (char)reader.Read();

                    if (chr == matches[0] && !startFound)
                    {
                        startFound = true;
                        start = charCount;
                    }

                    if (startFound && matches.Contains(chr))
                    {
                        buff.Append(chr);

                        if (buff.Length == pattern.Length
                            && buff.ToString() == pattern)
                        {
                            return start;
                        }

                        bool reset = false;

                        if (buff.Length > pattern.Length)
                        {
                            reset = true;
                        }
                        else
                        {
                            string subStr = pattern.Substring(0, buff.Length);

                            if (buff.ToString() != subStr)
                            {
                                reset = true;
                            }
                        }

                        if (reset)
                        {
                            buff.Length = 0;
                            startFound = false;
                            start = 0;
                        }
                    }

                    charCount++;
                }
            }
            finally
            {
                reader.BaseStream.Position = 0;
                reader.DiscardBufferedData();
            }
        }

        return -1;
    }
}

Answer 4

FileStream.Position（或等效的 StreamReader.BaseStream.Position）通常會在 TextReader 位置之前 - 可能遠遠領先 - 因為底層緩沖發生。

如果您可以確定如何處理文本文件中的換行符，則可以根據行長度和行尾字符將讀取的字節數相加。

File.WriteAllText("test.txt", "1234" + System.Environment.NewLine + "56789");

long position = -1;
long bytesRead = 0;
int newLineBytes = System.Environment.NewLine.Length;

using (var sr = new StreamReader("test.txt"))
{
    string line = sr.ReadLine();
    bytesRead += line.Length + newLineBytes;

    Console.WriteLine(line);

    position = bytesRead;
}

Console.WriteLine("Wait");

using (var sr = new StreamReader("test.txt"))
{
    sr.BaseStream.Seek(position, SeekOrigin.Begin);
    Console.WriteLine(sr.ReadToEnd());
}

對於更復雜的文本文件編碼，您可能需要比這更有趣，但它對我有用。

Answer 5

來自 MSDN：

StreamReader 設計用於特定編碼中的字符輸入，而 Stream 類設計用於字節輸入和輸出。 使用 StreamReader 從標准文本文件中讀取信息行。

在大多數涉及StreamReader的示例中，您將看到使用 ReadLine() 逐行讀取。 Seek 方法來自Stream類，它基本上用於以字節為單位讀取或處理數據。

Answer 6

我發現上面的建議對我不起作用——我的用例是只需要備份一個讀取位置（我使用默認編碼一次讀取一個字符）。 我的簡單解決方案受到上述評論的啟發......你的里程可能會有所不同......

我在讀取之前保存了 BaseStream.Position，然后確定是否需要備份...如果是，則設置位置並調用 DiscardBufferedData()。

c# - StreamReader 和尋求

問題描述

6 個解決方案

解決方案1
34 2013-07-03 20:01:14

解決方案2
16 已采納 2011-03-23 11:10:21

解決方案3
1 2017-08-18 04:44:40

解決方案4
0 2015-01-27 21:09:34

解決方案5
0 2011-03-23 11:19:17

解決方案6
0 2022-12-16 03:35:13

c# - StreamReader 和尋求

問題描述

6 個解決方案

解決方案1 34 2013-07-03 20:01:14

解決方案2 16 已采納 2011-03-23 11:10:21

解決方案3 1 2017-08-18 04:44:40

解決方案4 0 2015-01-27 21:09:34

解決方案5 0 2011-03-23 11:19:17

解決方案6 0 2022-12-16 03:35:13

解決方案1
34 2013-07-03 20:01:14

解決方案2
16 已采納 2011-03-23 11:10:21

解決方案3
1 2017-08-18 04:44:40

解決方案4
0 2015-01-27 21:09:34

解決方案5
0 2011-03-23 11:19:17

解決方案6
0 2022-12-16 03:35:13