简体繁体 English

gcc 输出文件是什么样的，它究竟包含什么？

[英]What does a gcc output file look like and what exactly does it contain?

原文 2021-11-14 13:50:50 0 1 gcc/ object-files

While compiling ac file, gcc by default compiles it to a file called "a.out".在编译 ac 文件时，gcc 默认将其编译为名为“a.out”的文件。 My professor said that the output file contains the binaries, but I when I open it I usually encounter unreadable text (VS Code says something like "This file contains unsupported text encoding").我的教授说输出文件包含二进制文件，但是当我打开它时，我通常会遇到不可读的文本（VS Code 说的是“此文件包含不受支持的文本编码”）。
I assumed that by 'binaries', I would be able to see literal zeroes and ones in the file but that does not seem to be the case.我假设通过“二进制文件”，我将能够在文件中看到字面的零和一，但情况似乎并非如此。 So what exactly does it output file look like or what exactly does it contain and what is 'text encoding'?那么它的输出文件到底是什么样的，或者它到底包含什么以及什么是“文本编码”？ Why can I not read it?为什么我读不懂？ What special characters might it contain?它可能包含哪些特殊字符？ I'm aware of the fact that gcc first pre-processes, which means it removes all comments, expands all macros and copies the contents of any header files that might be included.我知道 gcc 首先进行预处理，这意味着它会删除所有注释、展开所有宏并复制可能包含的任何头文件的内容。 You get the header file by running gcc -E <file_name>.c , then the this processed file is complied into assembly.你通过运行gcc -E <file_name>.c得到头文件，然后这个处理过的文件被编译成程序集。 Up to this point, the output files are readable, ie, I can open them with VS Code, but after this the assembled code and the object file thereafter are human-unreadable .到目前为止，输出文件是可读的，即，我可以用 VS Code 打开它们，但在此之后，汇编代码和其后的目标文件是人类不可读的。

For reference, I have no prior experience with programming or any language for that matter and this is my first CS related course in my first sem of college, and I apologize if this is too trivial of a question to ask.作为参考，我之前没有编程或任何语言方面的经验，这是我在大学第一学期的第一门 CS 相关课程，如果这个问题太琐碎而无法提出，我深表歉意。

1 个解决方案

I actually had the same confusion early on.我其实很早就有同样的困惑。 Not about that file type specifically, but about binary vs text files.不是专门针对该文件类型，而是关于二进制文件与文本文件。

After all aren't all files, even text ones binary?毕竟不是所有的文件，甚至是文本的二进制文件？ In the sense that all information is 1 s and 0 s?从某种意义上说，所有信息都是1 s 和0 s？ Well, yes, all information can be stored/transmitted as 1 s and 0 s, but that's not what binary/text files refer to.嗯，是的，所有信息都可以作为1 s 和0 s 存储/传输，但这不是二进制/文本文件所指的内容。

It refers to what that information, the content of the file, those 1 s and 0 s represent.它指的是那个信息，文件的内容，那些1和0代表什么。

In a text file the bytes encode characters.在文本文件中，字节对字符进行编码。 In a binary file the bits encode some information that is not text.在二进制文件中，位对一些非文本信息进行编码。 The format and semantics of that information is completely free, it can mean anything and use whatever encoding scheme.该信息的格式和语义是完全自由的，它可以表示任何内容并使用任何编码方案。 It's up to the application that writes/reads the file to properly understand the bit patterns.由写入/读取文件的应用程序来正确理解位模式。

Most text editors (like VS Code) when open a file they treat it as a text file.大多数文本编辑器（如 VS Code）在打开文件时将其视为文本文件。 Ie they try to interpret the bit patterns as a text encoding scheme (eg ASCII or UTF-8) But not all bit patterns are valid ASCII/UTF-8 so that's why you get "unsupported text encoding".即他们尝试将位模式解释为文本编码方案（例如 ASCII 或 UTF-8），但并非所有位模式都是有效的 ASCII/UTF-8，因此您会得到“不支持的文本编码”。

If you want to inspect the actual 1 s and 0 for both text and binary files you need to use a utility that shows you that, eg hex viewers/editors.如果您想检查文本和二进制文件的实际1和0 ，您需要使用一个实用程序来显示，例如十六进制查看器/编辑器。