处理器如何处理不同的数据类型？

Question

What does a machine do to differentiate between interpretation protocols between data types, for example: between 1-byte, 2-byte, 4-byte integers , floats, doubles, chars, and Unicode formats? 机器如何区分数据类型之间的解释协议，例如：1字节，2字节，4字节整数，浮点数，双精度数，字符和Unicode格式？

I hypothesize that there is some sort of heading notation to distinguish between configurations of sets of operations valid for the data type that makes up a portion of the final executable, but I would like a precise answer if anyone is willing to explain. 我假设存在某种标题符号，以区分对构成最终可执行文件一部分的数据类型有效的操作集的配置，但是如果有人愿意解释，我希望得到一个准确的答案。

Answer 1

Processors/logic in general doesnt know one bit from another, that is primarily in the eyes of the user or programmer. 通常，处理器/逻辑彼此之间一点也不了解，这主要是在用户或程序员看来。 When computing an address for example 例如在计算地址时

*ptr++; * ptr ++;

the pointer is just bits at some address which is just bits so you put some bits in a register then you do a read from those bits for that one clock cycle as those bits move toward the main bus(es) those bits are an address, some bits come back they are just bits, they go in a register. 指针只是某个地址上的位，也就是位，因此您将一些位放入寄存器中，然后在一个时钟周期内从这些位中读取数据，因为这些位移向主总线，这些位是一个地址，一些位返回，它们只是位，它们进入寄存器。 Those bits are sent to the alu with constant that pointer could be to a structure so ++ could be some number other than one, in either case, it is sent along as well, or in some architectures you have to load it into a register first. 这些位使用常量发送到alu，指针可能是一个结构，所以++可能不是一个数字，在任何一种情况下，它也一起发送，或者在某些体系结构中，您必须将其加载到寄存器中第一。 the alu does some math on those bits, for those few clock cycles those bits are operands. alu对这些位进行一些数学运算，在这几个时钟周期内，这些位是操作数。 addition and subtraction dont know about unsigned vs signed so it is just bits. 加法和减法不知道无符号与带符号，所以这只是位。 those fall out say into a register, and now we write those bits to the address we used before which is only an address for the one/few clock cycles it is sampled from a register and sent on its way. 那些掉落的东西说到一个寄存器中，现在我们将那些位写入以前使用的地址，该地址只是一个/几个时钟周期的地址，它是从一个寄存器中采样并按其方式发送的。

float is just bits sent to the fpu. float只是发送到fpu的位。

byte, halfword, word, doubleword, etc transfers are driven by the instruction or other logic (a fetch is not done with instructions normally) on a read with modern 32 or 64 or so width busses the read is of the whole width and as it gets to the processor core the byte lanes of interest are stripped off, depends on the architecture and bus of course, writes for targets that accept bytes halfwords words will have some scheme either a bytemask indicating which lanes are valid (vs which lanes which are dont cares) or there will be a size in units of bytes and both sides agree to align those bytes on the bus in a certain way (bit lanes 0 to n-1 would make most sense but whatever they design). 字节，半字，字，双字等的传输是由指令或其他逻辑驱动的（通常不使用指令进行提取）在使用现代32或64左右宽度总线的读取上进行读取，该读取具有整个宽度到达处理器核心的目标字节通道被剥离，这取决于体系结构和总线，当然，针对接受字节半字的目标的写操作将采用某种方案，即表示哪些通道有效的字节掩码（与哪些通道无效的字节掩码）或以字节为单位的大小，并且双方同意以某种方式在总线上对齐这些字节（位通道0到n-1是最有意义的，但无论它们是什么设计的）。

Now computer languages which are generally not computer friendly have lots of opinions about bits, this variable is this type and that variable is that type and in some languages if I want to "convert" 8 bits from one type to another an ascii character to a data byte I have to some kind of conversion which is of course dead code as there is no such thing as ascii nor a data byte in the processor. 现在，通常对计算机不友好的计算机语言对位有很多看法，此变量就是这种类型，而该变量就是这种类型，在某些语言中，如果我想将8位从一种类型“转换”为另一种，将ascii字符转换为数据字节我必须进行某种转换，这当然是无效代码，因为处理器中没有ascii或数据字节之类的东西。 Now sometimes there are conversions that do matter, perhaps even the data one if the byte is not stored as a byte but is in a register, then if unsigned a sign extension has to happen. 现在有时有些转换很重要，如果字节没有存储为字节而是存储在寄存器中，那么即使数据也可能转换一次，如果未签名，则必须进行符号扩展。

Most of this is trivial to see if you disassemble your code for whatever language you are using. 大部分这些都是很琐碎的，看看您是否使用任何语言反汇编代码。 You will eventually find for example that on a 32 or 64 bit computer it is inefficient to try to save space and use 8 bit variables rather than 32 or 64 as the compiler has to add masking and sign extension for the folks that tend to use signed integers. 您最终会发现，例如，在32位或64位计算机上，尝试节省空间并使用8位变量而不是32位或64位效率不高，因为编译器必须为倾向于使用带符号的人添加掩码和符号扩展整数。

Note the processor doesnt know instructions from data either, whatever is fed in is what it consumes as instructions. 请注意，处理器也不知道来自数据的指令，无论输入的是什么，它作为指令消耗。 It is up to the programmer and compiler to not ask it to consume bits as instructions that are not instructions. 由程序员和编译器决定不要求它消耗位作为不是指令的指令。 There are no magic bits in the patterns to indicate data from instruciton. 模式中没有魔术位来指示来自指令的数据。 There is the academic "harvard architecture" but those really dont work well in the real world, modern processors are generally modified harvard as they use one main bus but tag transactions as instruction fetches or data cycles (so if you have an i cache vs d cache you can sort them). 有学术上的“哈佛体系结构”，但是在现实世界中确实不能很好地工作，现代处理器通常是修改后的哈佛，因为它们使用一条主总线，但是将事务标记为指令提取或数据周期（因此，如果您拥有i cache vs d您可以对它们进行排序）。

bits is bits, they have no meaning to the computer, only to the humans. 位是位，它们对计算机没有意义，对人类没有意义。

EDIT 编辑

char *ptr;
float f;
void more_fun ( float );
void fun ( void )
{
    ptr=(char *)0x3000;
    *ptr++=5;
    *ptr++=4;
    *ptr++=6;
    f=1.0F;
    more_fun(f);
}

with one compiler for one processor gives 用一个处理器的一个编译器给出

00000000 <fun>:
   0:   e3a02a03    mov r2, #12288  ; 0x3000
   4:   e3a00005    mov r0, #5
   8:   e3a01004    mov r1, #4
   c:   e59f304c    ldr r3, [pc, #76]   ; 60 <fun+0x60>
  10:   e59fc04c    ldr r12, [pc, #76]  ; 64 <fun+0x64>
  14:   e92d4010    push    {r4, lr}
  18:   e583c000    str r12, [r3]
  1c:   e5c20000    strb    r0, [r2]
  20:   e5932000    ldr r2, [r3]
  24:   e2820001    add r0, r2, #1
  28:   e5830000    str r0, [r3]
  2c:   e3a0e006    mov lr, #6
  30:   e5c21000    strb    r1, [r2]
  34:   e3a025fe    mov r2, #1065353216 ; 0x3f800000
  38:   e5931000    ldr r1, [r3]
  3c:   e59fc024    ldr r12, [pc, #36]  ; 68 <fun+0x68>
  40:   e2810001    add r0, r1, #1
  44:   e5830000    str r0, [r3]
  48:   e5c1e000    strb    lr, [r1]
  4c:   e1a00002    mov r0, r2
  50:   e58c2000    str r2, [r12]
  54:   ebfffffe    bl  0 <more_fun>
  58:   e8bd4010    pop {r4, lr}
  5c:   e12fff1e    bx  lr
  60:   00000000    andeq   r0, r0, r0
  64:   00003001    andeq   r3, r0, r1
  68:   00000000    andeq   r0, r0, r0

this is unlinked. 这是未链接的。

it puts the address of ptr in a register (this is also optimized), it preps some of the constants. 它将ptr的地址放在一个寄存器中（这也是优化的），它准备了一些常量。 It gets the address to ptr (global variable so at compile time it doesnt know where it is, so it has to leave a place it can reach for the linker to fill in, in other instruction sets same problem but the "location" is a immediate to the instruction and the instruction is incomplete until link time, but the space is left either way). 它获取ptr的地址（全局变量，因此在编译时它不知道它在哪里，因此它必须留下一个可以到达的位置，以供链接器填充，在其他指令集中存在相同的问题，但是“位置”是一个紧接该指令，并且该指令在链接时间之前是不完整的，但无论哪种方式都留有空格）。 for each ptr++ we have to save back to the global as it is global so r3 holds the address to ptr for the duration. 对于每个ptr ++，我们必须保存回全局，因为它是全局的，因此r3在持续时间内将地址保存到ptr。

ldr r12, [pc, #76]  ; 64 <fun+0x64>

ahh a missed optimization opportunity add r12,r2,#1 would have been much cheaper. 啊，错过优化机会再加上r12，r2，＃1会便宜得多。

so ptr+1 is saved to ptr in memory (these are all bits to the processor notice how it doesnt know this is a pointer nor a signed pointer some of these bits are an address but only when the register is used as an address is it an address. 因此ptr + 1会保存到内存中的ptr中（处理器的所有这些位都会注意到，它不知道这是指针还是带符号的指针，这些位中的某些位是地址，但只有当寄存器用作地址时，它才是地址一个地址。

add r0, r2, #1

here the bits we think of as an address are just bits being added. 在这里，我们认为是地址的位只是被添加的位。

str r0, [r3]

and just bits being stored. 并只存储一些位。

mov r2, #1065353216 ; 0x3f800000

Floating point 1.0 single precision, just bits 浮点数1.0单精度，仅几位

as far as the notion that cisc vs risc treat things differently cisc uses smaller registers and cisc uses 32 or 64 bit. 就cisc与risc对待事物的区别而言，cisc使用较小的寄存器，而cisc使用32或64位。

unsigned short fun ( unsigned short x )
{
    return(x+0x1000);
}

00000000 <fun>:
   0:   e2800a01    add r0, r0, #4096   ; 0x1000
   4:   e1a00800    lsl r0, r0, #16
   8:   e1a00820    lsr r0, r0, #16
   c:   e12fff1e    bx  lr


0000000000000000 <fun>:
   0:   8d 87 00 10 00 00       lea    0x1000(%rdi),%eax
   6:   c3                      retq

the arm is at least honoring the 16 bit boundary, hopefully x86 takes care of that somewhere else. 手臂至少要遵守16位边界，希望x86可以解决其他问题。

unsigned short fun ( void  )
{
    return(sizeof(unsigned short));
}

00000000 <fun>:
   0:   e3a00002    mov r0, #2
   4:   e12fff1e    bx  lr

0000000000000000 <fun>:
   0:   b8 02 00 00 00          mov    $0x2,%eax
   5:   c3                      retq

EDIT 2 编辑2

taking some code from opencores this processor offers an 8 or 16 bit addition and this is a not unexpected solution. 从opencores获取一些代码，该处理器提供了8位或16位加法，这并不是一个意外的解决方案。

wire [16:0] alu_add        = op_src_in_jmp + op_dst_in;

wire    V           = inst_bw ? ((~op_src_in[7]  & ~op_dst_in[7]  &  alu_out[7])  |
                                 ( op_src_in[7]  &  op_dst_in[7]  & ~alu_out[7])) :
                                ((~op_src_in[15] & ~op_dst_in[15] &  alu_out[15]) |
                                 ( op_src_in[15] &  op_dst_in[15] & ~alu_out[15]));

wire    N           = inst_bw ?  alu_out[7]       : alu_out[15];
wire    Z           = inst_bw ? (alu_out[7:0]==0) : (alu_out==0);
wire    C           = inst_bw ?  alu_out[8]       : alu_out_nxt[16];

used a single 16 bit adder but muxed the flags based on 8 or 16 bit operation. 使用单个16位加法器，但基于8或16位运算来复用标志。 now what does the verilog compiler actually produce? 现在，verilog编译器实际产生什么？ it is quite possible that it daisy chains two 8 bit adders together so it can tap off the flag results. 它很有可能以菊花链方式将两个8位加法器链接在一起，以便可以提取标志结果。 or maybe there is a 16 bit adder in the library with these flags already tapped into. 或者库中有一个16位加法器，其中的这些标志已经被窃听。

EDIT 编辑

As far as the size of the data, the width. 至于数据的大小，宽度。 That is encoded indirectly or directly in the instruction. 可以在指令中间接或直接编码。 As shown above this becomes a control signal or signals in how the data is handled. 如上所示，这成为一个或多个控制信号，说明如何处理数据。 Take an ARM and do a byte wide read, you get a 32 or 64 bit read because there is no reason not to. 使用ARM进行字节宽的读取，由于没有理由不进行读取，因此可以读取32位或64位。 then as the data hits the core the byte lane is selected, if the instruction is designed to sign extend then it will and save the data read with its zero padding or sign extension into the register defined in the instruction. 然后，当数据到达内核时，选择字节通道，如果指令被设计为对符号扩展进行设计，则它将以零填充或符号扩展将读取的数据保存到指令中定义的寄存器中。 Similar on other architectures with busses wider than a byte, although it is possible to have the target isolate the byte rather than the processor core, just depends on the bus design and I bet there is at least one example of each out there. 与其他总线宽度大于一个字节的体系结构类似，尽管可以使目标隔离该字节而不是处理器内核，但这仅取决于总线设计，我敢肯定那里至少每个都有一个示例。 Since memory tends to be in bus widths or multiples it doesnt cost you more to read a whole row of the ram and move it or at least a bus width. 由于内存往往是总线宽度或倍数，因此读取整行的内存并移动它或至少改变总线宽度不会花更多的钱。 Fractions of the widt have a (generally minimal) cost, no need to burn that cost on both ends so pick one. 寡妇的分数具有（通常是最小的）成本，不需要消耗两端的成本，所以选择一个。 Writes are the painful one, anything less than the width of the ram causes a read-modify-write by the controller. 写操作很麻烦，任何小于ram宽度的操作都会导致控制器进行read-modify-write操作。 The instruction directly or indirectly indicates the width which is encoded on the bus in some way, the memory controller right near the ram has to deal with the fraction. 该指令直接或间接指示以某种方式在总线上编码的宽度，在内存附近的内存控制器必须处理分数。 This is one of the benefits you get from a cache, even though most dram dimms are made from either 8 bit wide or 16 bit wide parts they are accessed in 64 bit widths, the cache line is such that it is one or more widths of the dram so you dont have to do read-modify-writes against such a slow memory, the read-modify-writes happen against the sram in the cache which is relatively much faster. 这是您从缓存中获得的好处之一，即使大多数dram调光是由8位宽或16位宽的部分制成的，它们都可以以64位宽进行访问，但缓存行的宽度是一个或多个dram，这样您就不必对如此慢的内存进行读-修改-写操作，而对缓存中的sram进行读-修改-写操作则相对要快得多。

Performance comes from alignment, as you can reduce the logic and be more efficient do an stm of four registers on an arm if 32 bit aligned but not 64 bit against a 64 bit wide bus you have to have three transfers one for the first word, one for the next two and one for the third and the first and last require a bytemask for the bus as they dont fill it but if the same four register stm is at a 64 bit aligned address it is a single transfer. 性能来自对齐，因为如果32位对齐而不是64位总线上的64位对齐，则可以减少逻辑并提高臂上四个寄存器的stm效率，因此第一个字必须进行三个传输，一个用于下两个，另一个用于第三个，第一个和最后一个要求总线的字节掩码，因为它们不填充总线，但是如果相同的四个寄存器stm在64位对齐的地址处，则是一次传输。 Each transfer requiring a few to many clocks of overhead. 每次传输都需要几到多个时钟的开销。 Likewise on the ram end if you are aligned and a full width or multiple of the memory width then it just writes, if any part of the transaction is a fraction then it has to read-modify-write. 同样地，在内存端，如果您对齐并且是存储器宽度的全宽度或倍数，那么它只是写，如果事务的任何部分是小数，则它必须进行读取-修改-写入。 If your cache ram happened to be 64 bits wide in the above stm combinations one would not only be slower for the bus overhead but also at the ram where you have three writes two of them read-modify writes rather than two clean writes in as few as two clock cycles. 如果在上述stm组合中您的缓存ram碰巧是64位宽，则不仅会降低总线开销，而且还会降低ram的内存，在ram中您有3次写入，其中2次为读取-修改写入，而不是只有2次的干净写入作为两个时钟周期。 Likely one of the reasons why the arm eabi changed to request the stack be aligned on 64 bit boundaries and the countless questions at SO as to why is this extra register being pushed when it isnt used. 臂eabi更改以要求将堆栈对齐在64位边界上的原因之一可能是原因之一，还有SO上的无数问题，为什么不使用此额外寄存器就将其压入。

As shown above in order to mask/pad the upper bits the compiler chose to shift twice knowing the second zero pads, had this been a signed int the compiler may very well have chosen a signed shift to sign extend the result to 32 bits. 如上所示，为了屏蔽/填充高位，编译器在知道第二个零填充的情况下选择了两次移位，如果这是一个带符号的int，则编译器很可能已经选择了一个带符号的移位来对结果进行扩展，以扩展到32位。

00000000 <fun>:
   0:   e2800a01    add r0, r0, #4096   ; 0x1000
   4:   e1a00800    lsl r0, r0, #16
   8:   e1a00820    lsr r0, r0, #16
   c:   e12fff1e    bx  lr

Ideally wanting to convert on the way into or out of the registers the size the compiler/human wanted into the native size of the register due to the architecture. 理想情况下，由于架构的原因，希望在途中将编译器/人员所需的大小转换为寄存器的本机大小。 For x86 you can pull flags at the various points so you dont need to sign extend or zero pad, later operations can do the math ignoring the upper bits and pulling the flags from the middle as needed. 对于x86，您可以在各个点上拉标志，因此不需要对扩展符号或零填充进行签名，以后的操作可以进行数学运算而忽略高位，并根据需要从中间拉出标志。

Now mips could have anded off the upper bits to zero them in one instruction based on how their immediates work. 现在，mips可以根据立即数的工作原理在一条指令中将高位清零。 Intel burns a ton of instruction space with immediates, and can do any size. 英特尔使用立即数来刻录大量指令空间，并且可以执行任何大小的操作。 (compensated by other instructions that are very small). （由其他很小的指令补偿）。 had it been an 8 bit data type 如果它是8位数据类型

unsigned char fun ( unsigned char x )
{
    return(x+0x10);
}

00000000 <fun>:
   0:   e2800010    add r0, r0, #16
   4:   e20000ff    and r0, r0, #255    ; 0xff
   8:   e12fff1e    bx  lr

the compiler knew better than to do the two shifts to zero pad. 编译器比将两个移位都移至零填充更了解。

but as one would expect 但正如人们所期望的

signed char fun ( signed char x )
{
    return(x+0x10);
}

00000000 <fun>:
   0:   e2800010    add r0, r0, #16
   4:   e1a00c00    lsl r0, r0, #24
   8:   e1a00c40    asr r0, r0, #24
   c:   e12fff1e    bx  lr

I can take a really big container of assorted lego blocks, and with those blocks I could build a house, I could build a bridge, etc. The blocks dont know a house from a bridge only the human does. 我可以带一个非常大的容器来存放各种lego积木，用这些积木我可以盖房子，也可以盖桥梁，等等。这些积木只有人类才能从桥上得知房子。 The processor doesnt know a bool from an int from an unsigned int, some know some different widths in that they can mask, pad, or sign extend but the human and the compiler know all and assemble the correct mixture of lego blocks in the correct order to implement their vision. 处理器不知道无符号整数的整数布尔值，有些处理器知道一些不同的宽度，因为它们可以掩蔽，填充或符号扩展，但是人类和编译器都知道这一切，并以正确的顺序组装了乐高积木的正确混合物实现他们的愿景。

Answer 2

It doesn't. 没有。

On x86, the operand size is encoded as part of the instruction. 在x86上，操作数大小被编码为指令的一部分。 Any time two differently sized operands are used, the smaller one needs to be extended to the size of the larger one first. 每当使用两个大小不同的操作数时，较小的一个就必须先扩展到较大的一个。 Depending on the specific operation, this is either done using zero-extension (upper bits filled with zero) or sign-extension (upper bit of smaller value copied to upper bits of new value). 根据特定的操作，可以使用零扩展（填充零的高位）或符号扩展（将较小值的高位复制到新值的高位）来完成。 For example, a program might use the movsx instruction to sign-extend a value before using it for some other operation. 例如，程序可能会先使用movsx指令对值进行符号扩展，然后再将其用于其他操作。

On RISC architectures, usually operations apply to entire words (eg, 32 or 64 bits) at a time and it's up to the software to know how to interpret the resulting bits. 在RISC架构上，通常一次一次将操作应用于整个字（例如32或64位），这取决于软件是否知道如何解释结果位。

Floating point values are handled by different circuitry than integer values and as such are stored in a separate set of registers. 浮点值由不同于整数值的电路处理，因此存储在单独的一组寄存器中。

处理器如何处理不同的数据类型？

问题描述

2 个解决方案

解决方案1
2 2017-11-06 01:06:42

解决方案2
1 2017-11-06 01:09:47

处理器如何处理不同的数据类型？

问题描述

2 个解决方案

解决方案1 2 2017-11-06 01:06:42

解决方案2 1 2017-11-06 01:09:47

解决方案1
2 2017-11-06 01:06:42

解决方案2
1 2017-11-06 01:09:47