简体   繁体   English

C# BinaryWriter 写入方法字符串大小

[英]C# BinaryWriter Write Method String Size

When writing a string to a binary file using C#, the length (in bytes) is automatically prepended to the output.使用 C# 将字符串写入二进制文件时,长度(以字节为单位)会自动添加到输出中。 According to the MSDN documentation this is an unsigned integer, but is also a single byte.根据MSDN 文档,这是一个无符号整数,但也是一个字节。 The example they give is that a single UTF-8 character would be three written bytes: 1 size byte and 2 bytes for the character.他们给出的示例是单个 UTF-8 字符将是三个写入字节:1 个大小字节和 2 个字符字节。 This is fine for strings up to length 255, and matches with the behaviour I've observed.这对于长度不超过 255 的字符串很好,并且与我观察到的行为相匹配。

However, if your string is longer than 255 bytes, the size of the unsigned integer grows as necessary.但是,如果您的字符串长于 255 个字节,则无符号整数的大小会根据需要增加。 As a simple example, consider 1024 characters as:作为一个简单的例子,将 1024 个字符视为:

string header = "ABCDEFGHIJKLMNOP";
for (int ii = 0; ii < 63; ii++)
{
  header += "ABCDEFGHIJKLMNOP";
}
fileObject.Write(header);

results in 2-bytes prepending the string.导致字符串前有 2 个字节。 Creating a 2^17 length string results in a somewhat maddening 3-byte array.创建一个 2^17 长度的字符串会导致一个有点令人抓狂的 3 字节数组。

The question, therefore, is how to know how many bytes to read to get the size of what follows when reading?因此,问题是如何知道读取多少字节才能获得读取时的大小? I wouldn't necessarily know a priori the header size.我不一定先验地知道标题大小。 Ultimately, can I force the Write(string) method to always use a consistent size (say 2 bytes)?最终,我是否可以强制 Write(string) 方法始终使用一致的大小(比如 2 个字节)?

A possible workaround is to write my own write(string) method, but I would like to avoid that for obvious reasons (similar questions here and here accept this as an answer).一种可能的解决方法是编写我自己的 write(string) 方法,但出于明显的原因,我想避免这种情况( 此处此处的类似问题接受此作为答案)。 Another more palatable workaround is to have the reader look for a specific character that starts the ASCII string information (maybe an unprintable character?), but that is not infallible.另一个更可口的解决方法是让读者查找开始 ASCII 字符串信息的特定字符(可能是不可打印的字符?),但这并非万无一失。 A final workaround (that I can think of) would be to force the string to be within the range of sizes for a particular number of size bytes;最后的解决方法(我能想到的)是强制字符串在特定大小字节数的大小范围内; again, that is non ideal.再一次,这不是理想的。

While forcing the size of the byte array to be consistent is the easiest, I have control over the reader so any clever reader solutions are also welcome.虽然强制字节数组的大小保持一致是最简单的,但我可以控制读取器,因此也欢迎任何聪明的读取器解决方案。

BinaryWriter and BinaryReader aren't the only way of writing binary data; BinaryWriterBinaryReader不是写入二进制数据的唯一方法; simply: they provide a convention that is shared between that specific reader and writer.很简单:它们提供特定读者和作者之间共享的约定 No, you can't tell them to use another convention - unless of course you subclass both of them and override the ReadString and Write(string) methods completely.不,您不能告诉他们使用其他约定 - 除非您当然对它们进行子类化并完全覆盖ReadStringWrite(string)方法。

If you want to use a different convention, then simply: don't use BinaryReader and BinaryWriter .如果您想使用不同的约定,那么只需:不要使用BinaryReaderBinaryWriter It is pretty easy to talk to a Stream directly using any text Encoding you want to get hold of the bytes and the byte count.使用您想要获取字节和字节数的任何文本Encoding直接与Stream对话非常容易。 Then you can use whatever convention you want .然后你可以使用任何你想要的约定 If you only ever need to write strings up to 65k then sure: use fixed 2 bytes (unsigned short).如果您只需要写入最多 65k 的字符串,那么请确保:使用固定的 2 个字节(无符号短整型)。 You'll also need to decide which byte comes first, of course (the "endianness").当然,您还需要决定哪个字节先出现(“字节序”)。

As for the size of the prefix: it is essentially using:至于前缀的大小:它本质上是使用:

int byteCount = this._encoding.GetByteCount(value);
this.Write7BitEncodedInt(byteCount);

with:与:

protected void Write7BitEncodedInt(int value)
{
    uint num = (uint) value;
    while (num >= 0x80)
    {
        this.Write((byte) (num | 0x80));
        num = num >> 7;
    }
    this.Write((byte) num);
}

This type of encoding of lengths is pretty common - it is the same idea as the "varint" that "protobuf" uses , for example (base-128, least significant group first, retaining bit order in 7-bit groups, 8th bit as continuation)这种类型的长度编码非常常见 - 例如,它与“protobuf”使用“varint”的想法相同(base-128,最低有效组在前,在 7 位组中保留位顺序,第 8 位为续)

If you want to write the length yourself:如果你想自己写长度:

using (var bw = new BinaryWriter(fs))
{
  bw.Write(length); // Use a byte, a short...
  bw.Write(Encoding.Unicode.GetBytes("Your string"));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM