简体   繁体   English

是什么使系统成为小端或大端?

[英]What makes a system little-endian or big-endian?

I'm confused with the byte order of a system/cpu/program. 我对system / cpu / program的字节顺序感到困惑。
So I must ask some questions to make my mind clear. 因此,我必须提出一些问题以使我的思路清晰。

Question 1 问题1

If I only use type char in my C++ program: 如果仅在C ++程序中使用char类型:

void main()
{
    char c = 'A';
    char* s = "XYZ";    
}

Then compile this program to a executable binary file called a.out . 然后将该程序编译为可执行文件二进制文件a.out
Can a.out both run on little-endian and big-endian systems? a.out能否同时在小端和大端系统上运行?

Question 2 问题2

If my Windows XP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox? 如果我的Windows XP系统是Little-endian,是否可以在VMWare / VirtualBox中安装Big-endian Linux系统? What makes a system little-endian or big-endian? 是什么使系统成为小端或大端?

Question 3 问题3

If I want to write a byte-order-independent C++ program, what do I need to take into account? 如果要编写与字节顺序无关的C ++程序,我需要考虑什么?

Can a.out both run on little-endian and big-endian system? a.out可以同时在Little-Endian和Big-Endian系统上运行吗?

No, because pretty much any two CPUs that are so different as to have different endian-ness will not run the same instruction set. 不可以,因为几乎任何两个具有不同字节序的CPU都不会运行相同的指令集。 C++ isn't Java; C ++不是Java。 you don't compile to something that gets compiled or interpreted. 您不会编译为编译或解释的内容。 You compile to the assembly for a specific CPU. 您编译到特定CPU的程序集。 And endian-ness is part of the CPU. 字节序是CPU的一部分。

But that's outside of endian issues. 但这不属于字节序问题。 You can compile that program for different CPUs and those executables will work fine on their respective CPUs. 您可以为不同的CPU编译该程序,这些可执行文件将在各自的CPU上正常工作。

What makes a system little-endian or big-endian? 是什么使系统成为小端或大端?

As far as C or C++ is concerned, the CPU. 就C或C ++而言,CPU。 Different processing units in a computer can actually have different endians (the GPU could be big-endian while the CPU is little endian), but that's somewhat uncommon. 实际上,计算机中不同的处理单元可以具有不同的字节序(GPU可以是big-endian,而CPU的字节序是little-endian),但这并不常见。

If I want to write a byte-order independent C++ program, what do I need to take into account? 如果我想编写一个字节顺序独立的C ++程序,我需要考虑什么?

As long as you play by the rules of C or C++, you don't have to care about endian issues. 只要您遵循C或C ++的规则,就不必担心字节顺序问题。

Of course, you also won't be able to load files directly into POD structs. 当然,您也将无法直接将文件加载到POD结构中。 Or read a series of bytes, pretend it is a series of unsigned shorts, and then process it as a UTF-16-encoded string. 或读取一系列字节,假装它是一系列无符号的短裤,然后将其作为UTF-16编码的字符串进行处理。 All of those things step into the realm of implementation-defined behavior. 所有这些都进入了实现定义的行为领域。

There's a difference between "undefined" and "implementation-defined" behavior. “未定义”和“实现定义”行为之间有区别。 When the C and C++ spec say something is "undefined", it basically means all manner of brokenness can ensue. 当C和C ++规范说某事是“未定义的”时,它基本上意味着可以发生各种破坏。 If you keep doing it, (and your program doesn't crash) you could get inconsistent results. 如果继续这样做,(并且程序不会崩溃),您可能会得到不一致的结果。 When it says that something is defined by the implementation, you will get consistent results for that implementation . 当它说实现已定义某些内容时,您将获得该实现的一致结果。

If you compile for x86 in VC2010, what happens when you pretend a byte array is an unsigned short array (ie: unsigned char *byteArray = ...; unsigned short *usArray = (unsigned short*)byteArray ) is defined by the implementation. 如果您在VC2010中为x86进行编译,则当您假装一个字节数组是一个无符号的短数组(即: unsigned char *byteArray = ...; unsigned short *usArray = (unsigned short*)byteArray )时会发生什么? 。 When compiling for big-endian CPUs, you'll get a different answer than when compiling for little-endian CPUs. 在为大端CPU进行编译时,您会得到与为小端CPU进行编译时不同的答案。

In general, endian issues are things you can localize to input/output systems. 通常,字节序问题是可以本地化到输入/输出系统的东西。 Networking, file reading, etc. They should be taken care of in the extremities of your codebase. 联网,文件读取等。它们应该在代码库的末端进行处理。

Question 1: 问题1:

Can a.out both run on little-endian and big-endian system? a.out可以同时在Little-Endian和Big-Endian系统上运行吗?

No. Because a.out is already compiled for whatever architecture it is targeting. 不会。因为a.out已经针对其目标架构进行了编译。 It will not run on another architecture that it is incompatible with. 它不会在与其不兼容的其他体系结构上运行。

However, the source code for that simple program has nothing that could possibly break on different endian machines. 但是,该简单程序的源代码没有任何可能在不同字节序的机器上中断的东西。

So yes it (the source) will work properly. 是的,它(源)将正常工作。 (well... aside from void main() , which you should be using int main() instead) (好吧……除了void main() ,您应该使用int main()代替)

Question 2: 问题2:

If my WindowsXP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox? 如果我的WindowsXP系统是Little-endian,是否可以在VMWare / VirtualBox中安装Big-endian Linux系统?

Endian-ness is determined by the hardware, not the OS. 字节序由硬件而不是操作系统决定。 So whatever (native) VM you install on it, will be the same endian as the host. 因此,无论您在其上安装的任何(本机)VM,其主机尾序都将相同。 (since x86 is all little-endian) (因为x86都是小端格式的)

What makes a system little-endian or big-endian? 是什么使系统成为小端或大端?

Here's an example of something that will behave differently on little vs. big-endian: 这是在小端与大端上会有所不同的示例:

uint64_t a = 0x0123456789abcdefull;
uint32_t b = *(uint32_t*)&a;
printf("b is %x",b)

*Note that this violates strict-aliasing, and is only for demonstration purposes. *请注意,这违反了严格混叠,仅用于演示目的。

Little Endian : b is 89abcdef
Big Endian    : b is 1234567

On little-endian, the lower bits of a are stored at the lowest address. 上小端的下位a被存储在最低地址。 So when you access a as a 32-bit integer, you will read the lower 32 bits of it. 因此,当您以32位整数访问a时,您将读取它的低32位。 On big-endian, you will read the upper 32 bits. 在big-endian上,您将读取高32位。

Question 3: 问题3:

If I want to write a byte-order independent C++ program, what do I need to take into account? 如果我想编写一个字节顺序独立的C ++程序,我需要考虑什么?

Just follow the standard C++ rules and don't do anything ugly like the example I've shown above. 只需遵循标准的C ++规则,并且不要像我上面显示的示例那样做任何丑陋的事情。 Avoid undefined behavior, avoid type-punning... 避免未定义的行为,避免类型操纵...

Little-endian / big-endian is a property of hardware. Little-endian / big-endian是硬件的属性。 In general, binary code compiled for one hardware cannot run on another hardware, except in a virtualization environments that interpret machine code, and emulate the target hardware for it. 通常,为一种硬件编译的二进制代码不能在另一种硬件上运行,除非在解释机器代码并为其仿真目标硬件的虚拟化环境中。 There are bi-endian CPUs (eg ARM, IA-64) that feature a switch to change endianness. 有双端CPU(例如ARM,IA-64)具有更改端序的开关。

As far as byte-order-independent programming goes, the only case when you really need to do it is to deal with networking. 就字节顺序无关的编程而言,真正需要做的唯一情况是处理网络。 There are functions such as ntohl and htonl to help you converting your hardware's byte order to network's byte order. 有诸如ntohlhtonl可以帮助您将硬件的字节顺序转换为网络的字节顺序。

The first thing to clarify is that endianness is a hardware attribute, not a software/OS attribute, so WinXP and Linux are not big-endian or little endian, but rather the hardware on which they run is either big-endian or little endian. 首先要说明的是,字节序是硬件属性,而不是软件/ OS属性,因此WinXP和Linux不是大字节序或小字节序,而是运行它们的硬件是大字节序或小字节序。

Endianness is a description of the order in which the bytes are stored in a data-type. 字节序是对字节以数据类型存储的顺序的描述。 A system that is big-endian stores the most significant (read biggest value) value first and a little-endian system stores the least significant byte first. big-endian系统首先存储最高有效(读取的最大值)值,little-endian系统首先存储最低有效字节。 It is not mandatory to have each datatype be the same as the others on a system so you can have mixed-endian systems. 并非必须使每个数据类型都与系统上的其他数据类型相同,因此可以具有混合字节序系统。

A program that is little endian would not run on a big-endian system, but that has more to with the instruction set available than the endianness of the system on which it was compiled. 小字节序的程序不能在大字节序的系统上运行,但是与可用的指令集有关的比与编译该程序的系统的字节序有关的更多。

If you want to write a byte-order independent program you simply need to not depend on the byte order of your data. 如果要编写字节顺序独立的程序,则只需要不依赖于数据的字节顺序即可。

1: The output of the compiler will depend on the options you give it and if you use a cross-compiler. 1:编译器的输出将取决于您提供的选项以及是否使用交叉编译器。 By default, it should run on the operating system you are compiling it on and not others (perhaps not even others of the same type; not all Linux binaries run on all Linux installs, for example). 默认情况下,它应该在您要对其进行编译的操作系统上运行,而不要在其他操作系统上运行(例如,甚至可能不是同一类型的其他操作系统;例如,并非所有Linux二进制文件都可以在所有Linux安装上运行)。 In large projects, this will be the least of your concern, as libraries, etc, will need built and linked differently on each system. 在大型项目中,这是您最不用担心的问题,因为库等需要在每个系统上以不同的方式构建和链接。 Using a proper build system (like make) will take care of most of this without you needing to worry. 使用适当的构建系统(如make)可以解决大多数问题,而无需担心。

2: Virtual machines abstract the hardware in such a way as to allow essentially anything to run within anything else. 2:虚拟机以某种方式抽象化硬件,以使基本上任何东西都可以在其他任何东西中运行。 How the operating systems manage their memory is unimportant as long as they both run on the same hardware and support whatever virtualization model is in use. 只要操作系统都在相同的硬件上运行并支持使用的任何虚拟化模型,操作系统如何管理其内存就无关紧要。 Endianness means the byte-order; 字节顺序是字节顺序。 if it is read left-right or right-left (or some other format). 如果是左右读取或左右读取(或其他格式)。 Some hardware supports both and virtualization allows both to coexist in that case (although I am not aware of how this would be useful except that it is possible in theory). 在这种情况下,某些硬件支持两者,并且虚拟化允许两者共存(尽管我不知道这在理论上是可行的,但如何有用)。 However, Linux works on many different architectures (and Windows some other than Ixxx), so the situation is more complicated. 但是,Linux在许多不同的体系结构上运行(Windows在Ixxx以外的体系结构上运行),因此情况更加复杂。

3: If you monkey with raw memory, such as with binary operators, you might put yourself in a position of depending on endianness. 3:如果猴子拥有诸如二进制运算符之类的原始内存,则可能会使自己处于依赖于字节序的位置。 However, most modern programming is at a higher level than this. 但是,大多数现代编程都处于更高的水平。 As such, you are likely to notice if you get into something which may impose endianness-based limitations. 因此,您可能会发现是否遇到了可能施加基于字节序的限制的问题。 If such is ever required, you can always implement options for both endiannesses using the preprocessor. 如果需要这样做,则始终可以使用预处理程序来实现两个字节序的选项。

The endianness of a system determine how the bytes are interpreted, so what bit is considered the "first" and what is considered the "last". 系统的字节顺序决定如何解释字节,因此哪个位被认为是“第一”,什么被认为是“最后”。

You need to care about it only when loading or saving from some sources external to your program, like disk or networks. 仅当从程序外部的某些源(例如磁盘或网络)进行加载或保存时,才需要关心它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM