简体   繁体   中英

Reading a null-terminated string

I am reading strings from a binary file. Each string is null-terminated. Encoding is UTF-8. In python I simply read a byte, check if it's 0, append it to a byte array, and continue reading bytes until I see a 0. Then I convert byte array into a string and move on. All of the strings were read correctly.

How can I read this in C#? I don't think I have the luxury of simply appending bytes to an array since the arrays are fixed size.

Following should get you what you are looking for. All of text should be inside myText list.

var data = File.ReadAllBytes("myfile.bin");
List<string> myText = new List<string>();
int lastOffset = 0;
for (int i = 0; i < data.Length; i++)
{
    if (data[i] == 0)
    {
        myText.Add(System.Text.Encoding.UTF8.GetString(data, lastOffset, i - lastOffset));
        lastOffset = i + 1;
    }
}

I assume you're using a StreamReader instance:

StringBuilder sb = new StringBuilder();
using(StreamReader rdr = OpenReader(...)) {
    Int32 nc;
    while((nc = rdr.Read()) != -1) {
          Char c = (Char)nc;
          if( c != '\0' ) sb.Append( c );
    }
}

You can either use a List<byte> :

List<byte> list = new List<byte>();
while(reading){ //or whatever your condition is
    list.add(readByte);
}

string output = Encoding.UTF8.GetString(list.ToArray());

Or you could use a StringBuilder :

StringBuilder builder = new StringBuilder();

while(reading){
    builder.Append(readByte);
}

string output = builder.ToString();

If your "binary file" only contains null terminated UTF8 strings, then for .NET it isn't a "binary file" but just a text file because null characters are characters too. So you could just use a StreamReader to read the text and split it on the null characters. (Six years later "you" would presumably be some new reader and not the OP.)

A one line (ish) solution would be:

using (var rdr = new StreamReader(path))
    return rdr.ReadToEnd().split(new char[] { '\0' });

But that will give you a trailing empty string if the last string in the file was "properly" terminated.

A more verbose solution that might perform differently for very large files, expressed as an extension method on StreamReader, would be:

List<string> ReadAllNullTerminated(this System.IO.StreamReader rdr)
{
    var stringsRead = new System.Collections.Generic.List<string>();
    var bldr = new System.Text.StringBuilder();
    int nc;
    while ((nc = rdr.Read()) != -1)
    {
        Char c = (Char)nc;
        if (c == '\0')
        {
            stringsRead.Add(bldr.ToString());
            bldr.Length = 0;
        }
        else
            bldr.Append(c);
    }

    // Optionally return any trailing unterminated string
    if (bldr.Length != 0)
        stringsRead.Add(bldr.ToString());

    return stringsRead;
}

Or for reading just one at a time (like ReadLine)

string ReadNullTerminated(this System.IO.StreamReader rdr)
{
    var bldr = new System.Text.StringBuilder();
    int nc;
    while ((nc = rdr.Read()) > 0)
        bldr.Append((char)nc);

    return bldr.ToString();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM