简体繁体 English

fread 和 fwrite 如何区分 C 中的不同数据（类型）？

[英]How do fread and fwrite distinguish between different data (types) in C?

原文 2020-02-14 06:44:18 1 2 c/ struct

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files.我正在使用程序和 C（使用 Ubuntu 及其 bash）并使用它来操作二进制数据文件。 First of all, when I use fopen(filename, 'w') it creates a file but without any extension.首先，当我使用fopen(filename, 'w')它会创建一个文件但没有任何扩展名。 However, when I use vim filename it opens it up in some binary form.但是，当我使用vim filename ，它会以某种二进制形式打开它。

For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file.对于这个问题，当我使用fwrite(array, sizeof(some struct), # of structs, filePointer)它会写入（我不确定如何以二进制形式）到文件中。 When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read.当我使用fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer)它以某种方式神奇地知道如何以二进制形式读取每个结构并将其放入数组，只需知道它的大小和读取多少。 What happens if I put a decimal value less than the number of structs there are in the # of structs parameter?如果我输入的十进制值小于# of structs参数中# of structs数，会发生什么？ How would fread know what to read correctly? fread如何知道正确阅读什么？ How does it work in reading data just by looking at the sizes and not knowing what type of data it is?仅通过查看大小而不知道它是什么类型的数据，它是如何读取数据的？

2 个解决方案

fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. fwrite将存储对象的内存字节写入输出流， fread将字节从输入流读取到其地址作为参数的内存中。 No assumption is made regarding the types and representations of the C objects stored in this memory.没有对存储在该内存中的 C 对象的类型和表示做任何假设。

Hence a number of problems can occur:因此，可能会出现许多问题：

the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches.基本类型的表示可以从一个编译器到另一个编译器、一个机器到另一个机器、一个操作系统到另一个，甚至可能取决于编译器开关。 Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.仅当您知道将文件读回字节兼容结构时，写入基本类型的内存表示的字节才有意义。
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems.访问输入和输出文件的模式很重要：正如您提到的，文件必须以二进制模式打开，以避免内存表示和文件内容之间的任何转换，例如遗留系统上的文本文件会发生什么。 For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.例如，MS-Windows 上的文本模式会导致0A字节在输出时转换为0D 0A序列，而在输入时会导致0D字节被剥离，从而导致初始内容中独立0D字节的内容不同。
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to.如果 C 结构包含指针，则写入输出的字节表示这些指针的值，而不是它们指向的值。 Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.将这些值读回内存很可能会创建无效的指针，并且不太可能有任何意义。
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread .如果 C 结构体末尾有一个灵活数组，则其内容不包含在fwrite写入或fread读取的sizeof(T)字节中。
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances. C 结构可能包含成员之间的填充，导致输出文件包含非确定性字节，这在某些情况下可能是一个问题。
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data.如果 C 结构中的数组只有部分有意义的内容，例如包含 C 字符串的char数组，请注意fwrite将写入空终止符之外的字节，这应该没有意义，但可能是敏感信息，例如密码片段或其他有意义的数据。 Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.仔细擦除这样的数组可以避免这个问题，但是填充字节不能被可靠地擦除，所以这个方案并不完美。

For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening.由于上述所有原因和其他原因，读取/写入二进制数据将保留给程序员确切知道发生了什么的非常特殊的情况。 For other purposes, saving as text files in human readable form is much preferred.出于其他目的，最好以人类可读的形式保存为文本文件。

In question comments from @David C. Rankin来自@David C. Rankin 的问题评论

"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf" “好吧，fread/fwrite 读写字节（二进制数据——如果你写出然后读入相同数量的字节——你会得到同样的东西）。如果你想在你需要担心的地方读写文本换行符等，fgets/fputs. 或 fprintf"

So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?所以我想我永远不会知道我用 fread 读了什么，除非我知道我用 fwrite 写了什么？

"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.." “是的，查看 fwrite(3) 中缓冲区的类型 - Linux 手册页类型为 void *。它只是 fwrite 用于写入的起始地址，无论您告诉它写入多少字节。（显然您知道什么它正在写入）fread 也是一样——它只是读取字节——你必须知道你在读什么（或者至少是它的格式）。这就是二进制 I/O 的意义所在，它只是字节——它是取决于你，程序员，知道你在写什么和读什么以及如何解包。否则，使用格式化的 I/O 和行、词等。”