简体   繁体   English

BinaryWriter编写有趣的角色

[英]BinaryWriter writes funny characters

Below is the code: 下面是代码:

using (FileStream fs = File.Create("data.txt"))
using (BinaryWriter bw = new BinaryWriter(fs))
{
   int num = 2019;
   bw.Write(num);
}

when I open the data.txt with my editor, I only see a funny character. 当我使用编辑器打开data.txt时,我只会看到一个有趣的角色。 so my questions are: 所以我的问题是:

Q1-Is this because the encoding of my editor is UTF-8 which is incompatible with the BinaryWriter format? Q1-这是因为我的编辑器的编码是UTF-8,与BinaryWriter格式不兼容吗? which encoding scheme should I use to be able to see the act 2019 in the text file? 我应该使用哪种编码方案才能在文本文件中看到act 2019?

Q2-what's the practical uses of BinaryWriter over other stream adapter such as StreamWriter? 问题2-与其他流适配器(例如StreamWriter)相比,BinaryWriter的实际用途是什么? for me the BinaryWriter does some weird things, for example, you use a BinaryWriter to write a int first, then write a string ..., then when you read the file by BinaryReader, you have to do ReadInt32() and then ReadString(), you can't mess up the sequence, if you do ReadString(), you get a funny character. 对我来说BinaryWriter做一些奇怪的事情,例如,您使用BinaryWriter先写一个int,然后写一个字符串...,然后当您通过BinaryReader读取文件时,必须先执行ReadInt32()然后再执行ReadString( ),则无法弄乱序列,如果您执行ReadString(),则会得到一个有趣的字符。 but who will 'remember' or know the sequences to read? 但是谁会“记住”或知道阅读的顺序呢?

OK, let's start with what your code does (see my added comments): 好的,让我们从您的代码开始做起(请参阅我添加的注释):

// create a FileStream to data.txt (a file with a .txt extension - not necessarily a text file) 
using (FileStream fs = File.Create("data.txt"))

// wrap the stream in the BinaryWriter class, which assists in writing binary files
using (BinaryWriter bw = new BinaryWriter(fs))
{
   // create a 32-bit integer
   int num = 2019;
   // write a 32-bit integer as 4 bytes
   bw.Write(num);
}

The first thing you'll note is that you're not writing a text file, you're writing a binary file. 首先要注意的是,您不是在编写文本文件,而是在编写二进制文件。 File extensions are a convention, and perhaps tell us what we should expect to find in a file, but they're not the gospel truth. 文件扩展名是一个约定,也许可以告诉我们在文件中应该找到的内容,但这不是福音的真理。 I could take a copy of Chrome.exe and rename it to Chrome.txt , but that doesn't make it a text file. 我可以复制一份Chrome.exe并将其重命名为Chrome.txt ,但这并不能使其成为文本文件。

Which encoding scheme should I use to be able to see the act 2019 in the text file? 我应该使用哪种编码方案才能在文本文件中看到act 2019?

When we talk about encoding, such as UTF-8, we're talking about text encoding - how to convert text to bytes, but we're not dealing with text in your code, so there isn't an applicable text encoding format for viewing a binary file. 当我们谈论编码(例如UTF-8)时,我们谈论的是文本编码-如何将文本转换为字节,但是我们不处理代码中的文本,因此没有适用的文本编码格式查看二进制文件。

What's the practical uses of BinaryWriter over other stream adapter such as StreamWriter? BinaryWriter相对于其他流适配器(例如StreamWriter)的实际用途是什么?

It allows you to quickly create a binary format from values in .NET. 它使您可以从.NET中的值快速创建二进制格式。 For example instead of having to manually convert an int value to 4 bytes, you can call bw.Write(num); 例如,不必手动将int值转换为4个字节,可以调用bw.Write(num); , and likewise you can read that data using the BinaryReader and br.ReadInt32() , for example. ,同样,您可以使用BinaryReaderbr.ReadInt32()读取数据。

You can't mess up the sequence, if you do ReadString(), you get a funny character. 您不能弄乱序列,如果您执行ReadString(),则会得到一个有趣的字符。 but who will 'remember' or know the sequences to read? 但是谁会“记住”或知道阅读的顺序呢?

When we talk about "file formats", we usually mean the conventions we follow for reading the file. 当我们谈论“文件格式”时,通常是指我们阅读文件时遵循的约定。 The reason we can start an application, read a ZIP file, listen to an MP3 file, or view a bitmap is because the software we use has been written to understand these binary formats. 我们可以启动应用程序,读取ZIP文件,收听MP3文件或查看位图的原因是因为我们使用的软件已被编写来理解这些二进制格式。

If we take bitmap as an example, there are numerous documents which describe the format of the file. 如果以位图为例,则有许多描述文件格式的文档。 A quick Google search reveals this one , this one and this one . 快速的Google搜索显示了这个这个这个 You could take any one of these and create a program to write an image file using BinaryWriter . 您可以选择其中任何一个,并使用BinaryWriter创建一个程序来写入图像文件。

Now, if you were creating your own format, you would probably write the writer and reader at the same time, or at least look at the code to the writer when it comes to writing the reader (unless you have a spec to follow, in which case you could use that). 现在,如果要创建自己的格式,则可能会同时编写作者和读者,或者至少要在编写读者时查看给作者的代码(除非您有遵循的规范,您可以使用哪种情况)。

But what I don't get is, the int I inserted is displayed as a funny character, the string I inserted is actually readable, so why string is readable but not int? 但是我没有得到的是,我插入的int显示为一个有趣的字符,我插入的字符串实际上是可读的,那么为什么字符串是可读的却不是int呢?

When you call Write(string) , you're actually writing two things: information about the length of the string, and then writing the string itself. 调用Write(string) ,实际上是在写两件事:有关字符串长度的信息,然后再编写字符串本身。 To do this, BinaryWriter must convert the string to bytes, which it does for you behind the scenes. 为此, BinaryWriter必须将字符串转换为字节,这将在后台为您完成。 You can read about that here and in the docs . 您可以在此处在docs中阅读有关内容。

So why can you read the string in your file? 那么,为什么您可以读取文件中的字符串? Well, it's because the text encoding used here is the same encoding you could use to write a text file. 好吧,这是因为这里使用的文本编码与编写文本文件所使用的编码相同。 Your text editor will do a best-effort kind of thing to render the contents of the entire file. 您的文本编辑器将尽力而为地呈现整个文件的内容。 You can see this if you drag any kind of binary file (eg Chrome.exe ) into a text editor. 如果将任何类型的二进制文件(例如Chrome.exe )拖动到文本编辑器中,都可以看到。

So how do you view the contents of your file? 那么如何查看文件内容? Well, you can use a hex editor . 好了,您可以使用十六进制编辑器 A hex editor allows you to view and edit binary files. 十六进制编辑器允许您查看和编辑二进制文件。 A hex editor will typically show your file as hexadecimal on one side, and an attempt at rendering it as text on the other. 十六进制编辑器通常会在一侧将您的文件显示为十六进制,而在另一侧尝试将其呈现为文本。

So, imagine your code is this: 因此,假设您的代码是这样的:

using (FileStream fs = File.Create("data.txt"))
using (BinaryWriter bw = new BinaryWriter(fs))
{
   int num = 2019;
   bw.Write(num);
   bw.Write("hello");
}

If we open it up in a hex editor, we see the following. 如果在十六进制编辑器中打开它,则会看到以下内容。 Note that the spaces between hexadecimal values are just to make it easier to read, and are not a representation of anything in the file: 请注意,十六进制值之间的空格仅是为了使其更易于阅读,而不表示文件中的任何内容:

E3 07 00 00 05 68 65 6C 6C 6F

There are three parts here: 这里分为三个部分:

E3 07 00 00    - the hexadecimal expression of little endian 2019
05             - indicating that the string is 5 _bytes_ long
68 65 6C 6C 6F - the hexadecimal representations of each character of the string "hello"

You can read about endianness here . 您可以在此处阅读有关字节序的信息 Think of it as whether a computer writes numbers "left to right" or "right to left". 可以将其视为计算机是写数字“从左到右”还是“从右到左”。

So looking at the int value as stored above, we could write it in big-endian (1 on the right-side) binary as: 因此,查看上面存储的int值,我们可以将其写入big-endian(右侧为1)二进制文件中,如下所示:

<  00   >  <  00   >  <  07   >  <  E3   >
0000 0000  0000 0000  0000 0111  1110 0011

We can then calculate this back to 2019, your original value. 然后我们可以将其计算回2019年,即您的原始值。

Note that the string length information can be more than one bye (as per this answer ). 请注意,字符串长度信息可以超过一个(根据此答案 )。

It's all a matter of file format. 这都是文件格式的问题。

When you use a StreamWriter, your output will be in a readable text which means that you can see what is inside in an editor. 使用StreamWriter时,输出将以可读文本显示,这意味着您可以看到编辑器中的内容。 For instance you may write a bool "true" or "false" When using the binary writer, the value is stored in its binary representation which would be 0 or 1 for a boolean. 例如,您可能会在布尔值中写"true""false" 。使用二进制编写器时,该值以其二进制表示形式存储,对于布尔值,该值为0或1。 Note that you can in a text file write "0" for true if you wish. 请注意,如果愿意,您可以在文本文件中将"0"写为true。

When it comes to remembering what is inside, either you use a file format which is self describing such as a csv with headers or you have to use a standard format (such as a MP3 for which you can find description online) or you have to write both reader and writer at the same time to make sure they match (even with a text format). 当要记住里面的内容时,要么使用自我描述的文件格式(例如带标题的csv),要么必须使用标准格式(例如可以在线查找描述的MP3),或者必须同时编写读者和作家,以确保他们匹配(即使使用文本格式)。

For instance by looking at "0,0" you can't tell if its two boolean separated by a comma or the number 0 in french format with one digit precision. 例如,通过查看"0,0"您无法确定它的两个布尔值是由逗号分隔还是法语格式的数字0(具有一位精度)。

Files are strings of numbers - like 13, 59, 93. To understand the contents of a file, you need a format - essentially a desciption of what the contents mean. 文件是数字字符串,例如13、59、93。要了解文件的内容,您需要一种格式 -本质上是对内容含义的描述。 To look at the bytes of a file, you can use a hex editor (instead of a text editor). 要查看文件的字节,可以使用十六进制编辑器(而不是文本编辑器)。

One such format is the text file. 一种这样的格式是文本文件。 Mind, there's no one text file format - as you've already noticed, your text editor allows you to select the encoding it will use when interpreting the text file. 请注意,没有一种文本文件格式-正如您已经注意到的,您的文本编辑器允许您选择在解释文本文件时将使用的编码。 If you choose the wrong encoding, the text will be different (though you might not notice with most encodings in English, since many characters are identical across most modern encodings). 如果您选择了错误的编码,则文本将有所不同(尽管大多数英文编码可能不会引起注意,因为在大多数现代编码中,许多字符是相同的)。 Encoding is what translates the number 65 (actually stored in the file) to the character 'A' . 编码是将数字65 (实际上存储在文件中)转换为字符'A' There's many other complications beyond encoding, which I'll leave for later. 除了编码,还有许多其他复杂性,我将在后面讨论。

You're using BinaryWriter . 您正在使用BinaryWriter As the name implies, it's designed to write binary files, rather than text files. 顾名思义,它是为写入二进制文件而不是文本文件而设计的。 If you want to write plain text files, use StreamWriter instead. 如果要编写纯文本文件,请改用StreamWriter A binary file is typically more compact than a text file, designed to be used by specific applications rather than directly read or modified by users. 二进制文件通常比文本文件更紧凑,二进制文件设计为由特定应用程序使用,而不是由用户直接读取或修改。 You can still write text inside a binary file - that's exactly what bw.Write("Hello") does; 您仍然可以在二进制文件中写入文本-这正是bw.Write("Hello")所做的; and since it uses the same encoding (by default) as your text editor, you actually see the word "Hello" in your editor. 并且由于它使用与文本编辑器相同的编码(默认情况下),因此您实际上在编辑器中看到单词“ Hello”。 Mind, there's also "funny characters" before the "Hello" - but for such a short string, they're not visible (some might be displayed as a space, others as control characters like "end of line" or "tab"; you can even write a beep that gets executed if you print out the file). 请注意,在“ Hello” 之前还有“有趣的字符” 但是对于这么短的字符串,它们是不可见的 (有些可能显示为空格,另一些则显示为控制字符,例如“行尾”或“制表符”;您甚至可以写一个哔声 ,如果您打印出该文件,该哔哔声将被执行)。 These represent the length of the following string, which allows you to quickly read the string, and only the string (or skip it while you're reading the file). 它们代表以下字符串的长度,这使您可以快速读取该字符串,并且仅读取该字符串(或者在读取文件时跳过该字符串)。

Now, reading and writing files needs a certain symmetry. 现在,读写文件需要一定的对称性。 As you noticed, if you write the file as "number first, then string", you also need to read it as "number first, then string". 如您所见,如果您将文件写为“先编号,然后是字符串”,则还需要先将其为“先编号,然后是字符串”。 It doesn't matter if the file is a text file or a binary file - for example, say you want to write down GPS coördinates to a file. 文件是文本文件还是二进制文件都没有关系-例如,假设您要将GPS坐标记录为文件。 If you write the lattitude first, and then longitude, another program (or user) reading the file as longitude first will get the wrong result. 如果先写纬度,然后再写经度,则另一个程序(或用户)首先读取文件为经度,将会得到错误的结果。 A simple file format like this is order dependent, and completely intolerant to any kind of error - skip one line when reading or writing, and the whole thing becomes completely unreadable. 像这样的简单文件格式取决于顺序,并且完全不容许任何类型的错误-读或写时跳过一行,整个内容变得完全不可读。

But of course, that's not the only way you can design a file format (though it's certainly very common). 但是,当然,这不是设计文件格式的唯一方法(尽管这当然很常见)。 There are formats explicitly designed to be less strict. 有些格式明确设计为不太严格。 For example, instead of a set of lines or comma-separated values, you could save your data in a JSON file: 例如,您可以将数据保存在JSON文件中,而不用一组行或逗号分隔的值:

{
  "longitude": 12.365,
  "lattitude": 32.131
}

The main benefit is that the format is more self-descriptive and human readable (and writable); 主要好处是该格式更具描述性,易于阅读(可写)。 you can see at a glance that the lattitude is 32.131 . 您可以一眼看出32.13132.131 An application still needs to understand what "lattitude" is, but you can see there's definitely progress here. 应用程序仍然需要了解什么是“纬度”,但是您可以看到这里肯定有进步。 It's also more tolerant towards some kinds of changes - for example, the reader application doesn't have to care if some of the fields are missing (and show incomplete information, rather than complete mess), or if new fields are added. 它还对某些更改具有更大的容忍度-例如,阅读器应用程序不必担心某些字段是否丢失(显示的信息不完整,而不是完整的混乱),或者是否添加了新字段。 It doesn't care about the order of the fields. 它不在乎字段的顺序。

It comes at a cost. 这是有代价的。 The file is much larger (a simple binary file could be 8 bytes or less, compared to the ~40 bytes or so for the sample JSON; this gets even more pronounced if there's arrays etc. involved). 该文件大得多(一个简单的二进制文件可能不超过8个字节,而示例JSON则约为40个字节左右;如果涉及数组等,则这一点更加明显)。 It's much harder for a program to parse, which might make loading the file slow. 程序解析起来要困难得多,这可能会使加载文件的速度变慢。 Not being strict about the format also has its benefits and curses - it can be very hard to ensure the program handles all the potential inputs correctly, especially if there's multiple different readers and writers. 对格式不严格也有其好处和弊端-确保程序正确处理所有可能的输入可能非常困难,尤其是在存在多个不同的读者和作家的情况下。

There's equivalent file formats in binary as well, one of the most popular nowadays being Protobuf. 二进制文件也有等效的文件格式,Protobuf是当今最受欢迎的文件格式之一。 It's not quite as self-descriptive, and cannot be human read easily, but it's also a lot stricter, much more space efficient and faster to read and write. 它不具有自我描述性,不能轻易被人阅读,但是它更严格,更节省空间并且读写速度更快。

In the end, you need to make a choice about the format you want to use for saving stuff. 最后,您需要选择要用于保存内容的格式。 Each has its own set of advantages and disadvantages. 每一种都有其自身的优点和缺点。 Some are very simple, like just using BinaryWriter to write a well-known sequence. 有些非常简单,例如仅使用BinaryWriter编写众所周知的序列。 Some support version compatibility, so a newer application can read or write the old application's files or vice versa. 有些支持版本兼容性,因此,较新的应用程序可以读取或写入旧应用程序的文件,反之亦然。 Some are specifically optimized for certain uses, like enabling quick search in the file's contents, or storing images efficiently. 有些文件针对某些用途进行了专门优化,例如可以快速搜索文件的内容或有效地存储图像。 Some are designed mainly to be easy to use (like JSON and Protobuf, or .NET's BinarySerializer ). 有些主要是为了易于使用而设计的(例如JSON和Protobuf或.NET的BinarySerializer )。

But in the end, the file is just a string of numbers. 但最后,文件只是一串数字。 You need rules to interpret those numbers to be useful. 您需要规则来解释这些数字才有用。 Pick the rules to suit your needs. 选择适合您需要的规则。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM