简体   繁体   English

二进制文件和操作系统

[英]Binary files and OS

I'm currently learning C++ and there are some (basic) things which I don't really know about and where I didn't find anything useful over different search engines. 我目前正在学习C ++,还有一些(基本的)我不知道的东西,以及我在不同搜索引擎上找不到任何有用的东西。

  • Well as all operating systems have different "binary formats" for their executeables (Windows/Linux/Mac) - what are the differences? 因为所有操作系统的可执行程序(Windows / Linux / Mac)都有不同的“二进制格式” - 有什么区别? I mean all of them are binary but is there anything (beside all the OS APIs) that really differs? 我的意思是它们都是二进制的,但是除了所有的OS API之外还有什么真的不同吗?

  • (Windows) This is a dumb question - but are all the applications there really just binary (and I mean just 0's and 1's)? (Windows)这是一个愚蠢的问题 - 但是所有的应用程序都只是二进制(我的意思是0和1)? In which format are they stored? 它们以哪种格式存储? (As you don't see 0's and 1's in all the Text editors but mainly non-displayable characters) (因为在所有文本编辑器中都没有看到0和1,但主要是不可显示的字符)

Best regards, lamas 最好的问候,喇嘛

Executables for Windows/Linux differ in: Windows / Linux的可执行文件的不同之处在于:

  • The format of the file headers, ie the part of the file that indexes where and what's what in the rest of the file; 文件头的格式,即文件的一部分,用于索引文件其余部分的位置和内容;
  • the instructions required for system calls (interrupts, register contents, etc) 系统调用所需的指令(中断,寄存器内容等)
  • the actual format in which binary code is linked together; 二进制代码链接在一起的实际格式; there are several different ones for Linux, and I think also for Windows. Linux有几个不同的版本,我认为也适用于Windows。

Applications are data and machine language opcodes, crammed into a file. 应用程序是数据和机器语言操作码,塞进文件中。 Most bytes in an executable file don't contain text and can therefore contain values between 0 and 255 inclusive, ie all possible values. 可执行文件中的大多数字节不包含文本,因此可以包含0到255之间的值,即所有可能的值。 People would say that's binary. 人们会说这是二进制的。 There are 8 bits in a byte, so each of those bytes could be said to contain 8 binary digits, some of which will be 0 and some 1. 一个字节中有8位,因此每个字节可以说包含8个二进制数字,其中一些将是0和一些1。

Executable file formats for Windows (PE), Linux (ELF), OS/X etc (MACH-O), tend to be designed to solve common problems, so they all share common features. Windows(PE),Linux(ELF),OS / X等(MACH-O)的可执行文件格式往往旨在解决常见问题,因此它们共享共同的功能。 However, each platform specifies a different standard, so the files are not compatible across platforms, even if the platforms use the same type of CPU. 但是,每个平台都指定了不同的标准,因此即使平台使用相同类型的CPU,这些文件也不会跨平台兼容。

Executable file formats are not only used for executable files, but also libraries, which also contain code but are never run directly by the user - only loaded into memory to satisfy the needs to directly executable binaries. 可执行文件格式不仅用于可执行文件,还用于库,它们也包含代码但从不直接由用户运行 - 只加载到内存中以满足直接可执行二进制文件的需要。

Common Features of an executable file format: 可执行文件格式的共同特征:

  • One or more blocks of executable code 一个或多个可执行代码块
  • One or more blocks of read-only data such as text and numbers 一个或多个只读数据块,如文本和数字
  • One or more blocks of read/write data 一个或多个读/写数据块
  • Instructions on where to place these blocks in memory when the application is run 有关在运行应用程序时将这些块放在内存中的说明
  • Instructions on what libraries (which are also in an 'executable file format') need to be loaded as well, and how they connect ( link ) up to this executable file. 还需要加载有关哪些库(也是“可执行文件格式”)的说明,以及它们如何连接( 链接 )到此可执行文件。
  • One or more tables mapping code and data locations to strings or ids that describe them, useful for linking and debugging. 一个或多个表将代码和数据位置映射到描述它们的字符串或id,对链接和调试很有用。

It's interesting to compare such formats to more basic formats, such as the venerable DOS .com file, which simply describes 64K of assorted 'stuff' to be loaded at the next available location, and has few of the features listed above. 将这些格式与更基本的格式进行比较是很有趣的,例如古老的DOS .com文件,它简单地描述了要在下一个可用位置加载的64K各种“东西”,并且上面列出的功能很少。

Binary in this sense is used to compare them to 'source' files, which are written in text format. 在这个意义上的二进制用于将它们与以文本格式编写的“源”文件进行比较。 Binary format simply says that they are encoded in a non-text way, and doesn't really relate to the 0-and-1 sense of binary. 二进制格式只是说它们是以非文本方式编码的,并不真正与0和1二进制意义相关。

When you get down to it, every single file in a computer is "binary" in the sense that it is stored as a sequence of 1s and 0s on disk (even text files). 当你了解它时,计算机中的每个文件都是“二进制”的,因为它在磁盘上存储为1和0的序列(甚至是文本文件)。 When you open up a file in a text editor, it groups these characters up into characters based on various encoding rules. 在文本编辑器中打开文件时,它会根据各种编码规则将这些字符分组为字符。 Now if the file is actually a text file, this will give you readable text. 现在,如果文件实际上是一个文本文件,这将为您提供可读文本。 However, if the file is not, the text editor will faithfully try and decode the stream of bits, but will most likely end up with lots of non-displayable characters as the bits are not actually the encoded forms of characters, but of CPU instructions. 但是,如果文件不是,文本编辑器将忠实地尝试和解码比特流,但很可能最终会有很多不可显示的字符,因为这些字节实际上不是字符的编码形式,而是CPU指令。

As for the other part of your question, about "binary formats": there are multiple formats for how to lay out the various parts of an executable, such as ELF or the Windows DLL/EXE format. 至于你的问题的另一部分,关于“二进制格式”:有多种格式可用于布置可执行文件的各个部分,例如ELF或Windows DLL / EXE格式。 These all specify exactly where in the file various parts of the executable are (ie where the metadata is, where the symbol table is, where the entry point is, where the static data and resources are, etc.) 这些都准确地指定了文件在文件中的各个部分的位置(即元数据所在的位置,符号表所在的位置,入口点所在的位置,静态数据和资源所在的位置等)

The most common file-format for Windows is PE ; Windows最常见的文件格式是PE ; for Linux is ELF . 对于Linux是ELF They both contain mostly the same things (data segment, code segment, etc) and are only different simply because they were designed separately. 它们都包含大部分相同的东西(数据段,代码段等),只是因为它们是分开设计而有所不同。

It should be noted that even if both Windows and Linux used the same file-format, they would still not be able to run each others' binaries, because the system APIs and available DLLs/SOs are completely different. 应该注意的是,即使Windows和Linux都使用相同的文件格式,它们仍然无法运行彼此的二进制文件,因为系统API和可用的DLL / SO完全不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM