简体   繁体   中英

BinaryWriter writes funny characters

Below is the code:

using (FileStream fs = File.Create("data.txt"))
using (BinaryWriter bw = new BinaryWriter(fs))
{
   int num = 2019;
   bw.Write(num);
}

when I open the data.txt with my editor, I only see a funny character. so my questions are:

Q1-Is this because the encoding of my editor is UTF-8 which is incompatible with the BinaryWriter format? which encoding scheme should I use to be able to see the act 2019 in the text file?

Q2-what's the practical uses of BinaryWriter over other stream adapter such as StreamWriter? for me the BinaryWriter does some weird things, for example, you use a BinaryWriter to write a int first, then write a string ..., then when you read the file by BinaryReader, you have to do ReadInt32() and then ReadString(), you can't mess up the sequence, if you do ReadString(), you get a funny character. but who will 'remember' or know the sequences to read?

OK, let's start with what your code does (see my added comments):

// create a FileStream to data.txt (a file with a .txt extension - not necessarily a text file) 
using (FileStream fs = File.Create("data.txt"))

// wrap the stream in the BinaryWriter class, which assists in writing binary files
using (BinaryWriter bw = new BinaryWriter(fs))
{
   // create a 32-bit integer
   int num = 2019;
   // write a 32-bit integer as 4 bytes
   bw.Write(num);
}

The first thing you'll note is that you're not writing a text file, you're writing a binary file. File extensions are a convention, and perhaps tell us what we should expect to find in a file, but they're not the gospel truth. I could take a copy of Chrome.exe and rename it to Chrome.txt , but that doesn't make it a text file.

Which encoding scheme should I use to be able to see the act 2019 in the text file?

When we talk about encoding, such as UTF-8, we're talking about text encoding - how to convert text to bytes, but we're not dealing with text in your code, so there isn't an applicable text encoding format for viewing a binary file.

What's the practical uses of BinaryWriter over other stream adapter such as StreamWriter?

It allows you to quickly create a binary format from values in .NET. For example instead of having to manually convert an int value to 4 bytes, you can call bw.Write(num); , and likewise you can read that data using the BinaryReader and br.ReadInt32() , for example.

You can't mess up the sequence, if you do ReadString(), you get a funny character. but who will 'remember' or know the sequences to read?

When we talk about "file formats", we usually mean the conventions we follow for reading the file. The reason we can start an application, read a ZIP file, listen to an MP3 file, or view a bitmap is because the software we use has been written to understand these binary formats.

If we take bitmap as an example, there are numerous documents which describe the format of the file. A quick Google search reveals this one , this one and this one . You could take any one of these and create a program to write an image file using BinaryWriter .

Now, if you were creating your own format, you would probably write the writer and reader at the same time, or at least look at the code to the writer when it comes to writing the reader (unless you have a spec to follow, in which case you could use that).

But what I don't get is, the int I inserted is displayed as a funny character, the string I inserted is actually readable, so why string is readable but not int?

When you call Write(string) , you're actually writing two things: information about the length of the string, and then writing the string itself. To do this, BinaryWriter must convert the string to bytes, which it does for you behind the scenes. You can read about that here and in the docs .

So why can you read the string in your file? Well, it's because the text encoding used here is the same encoding you could use to write a text file. Your text editor will do a best-effort kind of thing to render the contents of the entire file. You can see this if you drag any kind of binary file (eg Chrome.exe ) into a text editor.

So how do you view the contents of your file? Well, you can use a hex editor . A hex editor allows you to view and edit binary files. A hex editor will typically show your file as hexadecimal on one side, and an attempt at rendering it as text on the other.

So, imagine your code is this:

using (FileStream fs = File.Create("data.txt"))
using (BinaryWriter bw = new BinaryWriter(fs))
{
   int num = 2019;
   bw.Write(num);
   bw.Write("hello");
}

If we open it up in a hex editor, we see the following. Note that the spaces between hexadecimal values are just to make it easier to read, and are not a representation of anything in the file:

E3 07 00 00 05 68 65 6C 6C 6F

There are three parts here:

E3 07 00 00    - the hexadecimal expression of little endian 2019
05             - indicating that the string is 5 _bytes_ long
68 65 6C 6C 6F - the hexadecimal representations of each character of the string "hello"

You can read about endianness here . Think of it as whether a computer writes numbers "left to right" or "right to left".

So looking at the int value as stored above, we could write it in big-endian (1 on the right-side) binary as:

<  00   >  <  00   >  <  07   >  <  E3   >
0000 0000  0000 0000  0000 0111  1110 0011

We can then calculate this back to 2019, your original value.

Note that the string length information can be more than one bye (as per this answer ).

It's all a matter of file format.

When you use a StreamWriter, your output will be in a readable text which means that you can see what is inside in an editor. For instance you may write a bool "true" or "false" When using the binary writer, the value is stored in its binary representation which would be 0 or 1 for a boolean. Note that you can in a text file write "0" for true if you wish.

When it comes to remembering what is inside, either you use a file format which is self describing such as a csv with headers or you have to use a standard format (such as a MP3 for which you can find description online) or you have to write both reader and writer at the same time to make sure they match (even with a text format).

For instance by looking at "0,0" you can't tell if its two boolean separated by a comma or the number 0 in french format with one digit precision.

Files are strings of numbers - like 13, 59, 93. To understand the contents of a file, you need a format - essentially a desciption of what the contents mean. To look at the bytes of a file, you can use a hex editor (instead of a text editor).

One such format is the text file. Mind, there's no one text file format - as you've already noticed, your text editor allows you to select the encoding it will use when interpreting the text file. If you choose the wrong encoding, the text will be different (though you might not notice with most encodings in English, since many characters are identical across most modern encodings). Encoding is what translates the number 65 (actually stored in the file) to the character 'A' . There's many other complications beyond encoding, which I'll leave for later.

You're using BinaryWriter . As the name implies, it's designed to write binary files, rather than text files. If you want to write plain text files, use StreamWriter instead. A binary file is typically more compact than a text file, designed to be used by specific applications rather than directly read or modified by users. You can still write text inside a binary file - that's exactly what bw.Write("Hello") does; and since it uses the same encoding (by default) as your text editor, you actually see the word "Hello" in your editor. Mind, there's also "funny characters" before the "Hello" - but for such a short string, they're not visible (some might be displayed as a space, others as control characters like "end of line" or "tab"; you can even write a beep that gets executed if you print out the file). These represent the length of the following string, which allows you to quickly read the string, and only the string (or skip it while you're reading the file).

Now, reading and writing files needs a certain symmetry. As you noticed, if you write the file as "number first, then string", you also need to read it as "number first, then string". It doesn't matter if the file is a text file or a binary file - for example, say you want to write down GPS coördinates to a file. If you write the lattitude first, and then longitude, another program (or user) reading the file as longitude first will get the wrong result. A simple file format like this is order dependent, and completely intolerant to any kind of error - skip one line when reading or writing, and the whole thing becomes completely unreadable.

But of course, that's not the only way you can design a file format (though it's certainly very common). There are formats explicitly designed to be less strict. For example, instead of a set of lines or comma-separated values, you could save your data in a JSON file:

{
  "longitude": 12.365,
  "lattitude": 32.131
}

The main benefit is that the format is more self-descriptive and human readable (and writable); you can see at a glance that the lattitude is 32.131 . An application still needs to understand what "lattitude" is, but you can see there's definitely progress here. It's also more tolerant towards some kinds of changes - for example, the reader application doesn't have to care if some of the fields are missing (and show incomplete information, rather than complete mess), or if new fields are added. It doesn't care about the order of the fields.

It comes at a cost. The file is much larger (a simple binary file could be 8 bytes or less, compared to the ~40 bytes or so for the sample JSON; this gets even more pronounced if there's arrays etc. involved). It's much harder for a program to parse, which might make loading the file slow. Not being strict about the format also has its benefits and curses - it can be very hard to ensure the program handles all the potential inputs correctly, especially if there's multiple different readers and writers.

There's equivalent file formats in binary as well, one of the most popular nowadays being Protobuf. It's not quite as self-descriptive, and cannot be human read easily, but it's also a lot stricter, much more space efficient and faster to read and write.

In the end, you need to make a choice about the format you want to use for saving stuff. Each has its own set of advantages and disadvantages. Some are very simple, like just using BinaryWriter to write a well-known sequence. Some support version compatibility, so a newer application can read or write the old application's files or vice versa. Some are specifically optimized for certain uses, like enabling quick search in the file's contents, or storing images efficiently. Some are designed mainly to be easy to use (like JSON and Protobuf, or .NET's BinarySerializer ).

But in the end, the file is just a string of numbers. You need rules to interpret those numbers to be useful. Pick the rules to suit your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM