简体   繁体   中英

How does a processor handle different data types?

What does a machine do to differentiate between interpretation protocols between data types, for example: between 1-byte, 2-byte, 4-byte integers , floats, doubles, chars, and Unicode formats?

I hypothesize that there is some sort of heading notation to distinguish between configurations of sets of operations valid for the data type that makes up a portion of the final executable, but I would like a precise answer if anyone is willing to explain.

Processors/logic in general doesnt know one bit from another, that is primarily in the eyes of the user or programmer. When computing an address for example

*ptr++;

the pointer is just bits at some address which is just bits so you put some bits in a register then you do a read from those bits for that one clock cycle as those bits move toward the main bus(es) those bits are an address, some bits come back they are just bits, they go in a register. Those bits are sent to the alu with constant that pointer could be to a structure so ++ could be some number other than one, in either case, it is sent along as well, or in some architectures you have to load it into a register first. the alu does some math on those bits, for those few clock cycles those bits are operands. addition and subtraction dont know about unsigned vs signed so it is just bits. those fall out say into a register, and now we write those bits to the address we used before which is only an address for the one/few clock cycles it is sampled from a register and sent on its way.

float is just bits sent to the fpu.

byte, halfword, word, doubleword, etc transfers are driven by the instruction or other logic (a fetch is not done with instructions normally) on a read with modern 32 or 64 or so width busses the read is of the whole width and as it gets to the processor core the byte lanes of interest are stripped off, depends on the architecture and bus of course, writes for targets that accept bytes halfwords words will have some scheme either a bytemask indicating which lanes are valid (vs which lanes which are dont cares) or there will be a size in units of bytes and both sides agree to align those bytes on the bus in a certain way (bit lanes 0 to n-1 would make most sense but whatever they design).

Now computer languages which are generally not computer friendly have lots of opinions about bits, this variable is this type and that variable is that type and in some languages if I want to "convert" 8 bits from one type to another an ascii character to a data byte I have to some kind of conversion which is of course dead code as there is no such thing as ascii nor a data byte in the processor. Now sometimes there are conversions that do matter, perhaps even the data one if the byte is not stored as a byte but is in a register, then if unsigned a sign extension has to happen.

Most of this is trivial to see if you disassemble your code for whatever language you are using. You will eventually find for example that on a 32 or 64 bit computer it is inefficient to try to save space and use 8 bit variables rather than 32 or 64 as the compiler has to add masking and sign extension for the folks that tend to use signed integers.

Note the processor doesnt know instructions from data either, whatever is fed in is what it consumes as instructions. It is up to the programmer and compiler to not ask it to consume bits as instructions that are not instructions. There are no magic bits in the patterns to indicate data from instruciton. There is the academic "harvard architecture" but those really dont work well in the real world, modern processors are generally modified harvard as they use one main bus but tag transactions as instruction fetches or data cycles (so if you have an i cache vs d cache you can sort them).

bits is bits, they have no meaning to the computer, only to the humans.

EDIT

char *ptr;
float f;
void more_fun ( float );
void fun ( void )
{
    ptr=(char *)0x3000;
    *ptr++=5;
    *ptr++=4;
    *ptr++=6;
    f=1.0F;
    more_fun(f);
}

with one compiler for one processor gives

00000000 <fun>:
   0:   e3a02a03    mov r2, #12288  ; 0x3000
   4:   e3a00005    mov r0, #5
   8:   e3a01004    mov r1, #4
   c:   e59f304c    ldr r3, [pc, #76]   ; 60 <fun+0x60>
  10:   e59fc04c    ldr r12, [pc, #76]  ; 64 <fun+0x64>
  14:   e92d4010    push    {r4, lr}
  18:   e583c000    str r12, [r3]
  1c:   e5c20000    strb    r0, [r2]
  20:   e5932000    ldr r2, [r3]
  24:   e2820001    add r0, r2, #1
  28:   e5830000    str r0, [r3]
  2c:   e3a0e006    mov lr, #6
  30:   e5c21000    strb    r1, [r2]
  34:   e3a025fe    mov r2, #1065353216 ; 0x3f800000
  38:   e5931000    ldr r1, [r3]
  3c:   e59fc024    ldr r12, [pc, #36]  ; 68 <fun+0x68>
  40:   e2810001    add r0, r1, #1
  44:   e5830000    str r0, [r3]
  48:   e5c1e000    strb    lr, [r1]
  4c:   e1a00002    mov r0, r2
  50:   e58c2000    str r2, [r12]
  54:   ebfffffe    bl  0 <more_fun>
  58:   e8bd4010    pop {r4, lr}
  5c:   e12fff1e    bx  lr
  60:   00000000    andeq   r0, r0, r0
  64:   00003001    andeq   r3, r0, r1
  68:   00000000    andeq   r0, r0, r0

this is unlinked.

it puts the address of ptr in a register (this is also optimized), it preps some of the constants. It gets the address to ptr (global variable so at compile time it doesnt know where it is, so it has to leave a place it can reach for the linker to fill in, in other instruction sets same problem but the "location" is a immediate to the instruction and the instruction is incomplete until link time, but the space is left either way). for each ptr++ we have to save back to the global as it is global so r3 holds the address to ptr for the duration.

ldr r12, [pc, #76]  ; 64 <fun+0x64>

ahh a missed optimization opportunity add r12,r2,#1 would have been much cheaper.

so ptr+1 is saved to ptr in memory (these are all bits to the processor notice how it doesnt know this is a pointer nor a signed pointer some of these bits are an address but only when the register is used as an address is it an address.

add r0, r2, #1

here the bits we think of as an address are just bits being added.

str r0, [r3]

and just bits being stored.

mov r2, #1065353216 ; 0x3f800000

Floating point 1.0 single precision, just bits

as far as the notion that cisc vs risc treat things differently cisc uses smaller registers and cisc uses 32 or 64 bit.

unsigned short fun ( unsigned short x )
{
    return(x+0x1000);
}

00000000 <fun>:
   0:   e2800a01    add r0, r0, #4096   ; 0x1000
   4:   e1a00800    lsl r0, r0, #16
   8:   e1a00820    lsr r0, r0, #16
   c:   e12fff1e    bx  lr


0000000000000000 <fun>:
   0:   8d 87 00 10 00 00       lea    0x1000(%rdi),%eax
   6:   c3                      retq   

the arm is at least honoring the 16 bit boundary, hopefully x86 takes care of that somewhere else.

unsigned short fun ( void  )
{
    return(sizeof(unsigned short));
}

00000000 <fun>:
   0:   e3a00002    mov r0, #2
   4:   e12fff1e    bx  lr

0000000000000000 <fun>:
   0:   b8 02 00 00 00          mov    $0x2,%eax
   5:   c3                      retq   

EDIT 2

taking some code from opencores this processor offers an 8 or 16 bit addition and this is a not unexpected solution.

wire [16:0] alu_add        = op_src_in_jmp + op_dst_in;

wire    V           = inst_bw ? ((~op_src_in[7]  & ~op_dst_in[7]  &  alu_out[7])  |
                                 ( op_src_in[7]  &  op_dst_in[7]  & ~alu_out[7])) :
                                ((~op_src_in[15] & ~op_dst_in[15] &  alu_out[15]) |
                                 ( op_src_in[15] &  op_dst_in[15] & ~alu_out[15]));

wire    N           = inst_bw ?  alu_out[7]       : alu_out[15];
wire    Z           = inst_bw ? (alu_out[7:0]==0) : (alu_out==0);
wire    C           = inst_bw ?  alu_out[8]       : alu_out_nxt[16];

used a single 16 bit adder but muxed the flags based on 8 or 16 bit operation. now what does the verilog compiler actually produce? it is quite possible that it daisy chains two 8 bit adders together so it can tap off the flag results. or maybe there is a 16 bit adder in the library with these flags already tapped into.

EDIT

As far as the size of the data, the width. That is encoded indirectly or directly in the instruction. As shown above this becomes a control signal or signals in how the data is handled. Take an ARM and do a byte wide read, you get a 32 or 64 bit read because there is no reason not to. then as the data hits the core the byte lane is selected, if the instruction is designed to sign extend then it will and save the data read with its zero padding or sign extension into the register defined in the instruction. Similar on other architectures with busses wider than a byte, although it is possible to have the target isolate the byte rather than the processor core, just depends on the bus design and I bet there is at least one example of each out there. Since memory tends to be in bus widths or multiples it doesnt cost you more to read a whole row of the ram and move it or at least a bus width. Fractions of the widt have a (generally minimal) cost, no need to burn that cost on both ends so pick one. Writes are the painful one, anything less than the width of the ram causes a read-modify-write by the controller. The instruction directly or indirectly indicates the width which is encoded on the bus in some way, the memory controller right near the ram has to deal with the fraction. This is one of the benefits you get from a cache, even though most dram dimms are made from either 8 bit wide or 16 bit wide parts they are accessed in 64 bit widths, the cache line is such that it is one or more widths of the dram so you dont have to do read-modify-writes against such a slow memory, the read-modify-writes happen against the sram in the cache which is relatively much faster.

Performance comes from alignment, as you can reduce the logic and be more efficient do an stm of four registers on an arm if 32 bit aligned but not 64 bit against a 64 bit wide bus you have to have three transfers one for the first word, one for the next two and one for the third and the first and last require a bytemask for the bus as they dont fill it but if the same four register stm is at a 64 bit aligned address it is a single transfer. Each transfer requiring a few to many clocks of overhead. Likewise on the ram end if you are aligned and a full width or multiple of the memory width then it just writes, if any part of the transaction is a fraction then it has to read-modify-write. If your cache ram happened to be 64 bits wide in the above stm combinations one would not only be slower for the bus overhead but also at the ram where you have three writes two of them read-modify writes rather than two clean writes in as few as two clock cycles. Likely one of the reasons why the arm eabi changed to request the stack be aligned on 64 bit boundaries and the countless questions at SO as to why is this extra register being pushed when it isnt used.

As shown above in order to mask/pad the upper bits the compiler chose to shift twice knowing the second zero pads, had this been a signed int the compiler may very well have chosen a signed shift to sign extend the result to 32 bits.

00000000 <fun>:
   0:   e2800a01    add r0, r0, #4096   ; 0x1000
   4:   e1a00800    lsl r0, r0, #16
   8:   e1a00820    lsr r0, r0, #16
   c:   e12fff1e    bx  lr

Ideally wanting to convert on the way into or out of the registers the size the compiler/human wanted into the native size of the register due to the architecture. For x86 you can pull flags at the various points so you dont need to sign extend or zero pad, later operations can do the math ignoring the upper bits and pulling the flags from the middle as needed.

Now mips could have anded off the upper bits to zero them in one instruction based on how their immediates work. Intel burns a ton of instruction space with immediates, and can do any size. (compensated by other instructions that are very small). had it been an 8 bit data type

unsigned char fun ( unsigned char x )
{
    return(x+0x10);
}

00000000 <fun>:
   0:   e2800010    add r0, r0, #16
   4:   e20000ff    and r0, r0, #255    ; 0xff
   8:   e12fff1e    bx  lr

the compiler knew better than to do the two shifts to zero pad.

but as one would expect

signed char fun ( signed char x )
{
    return(x+0x10);
}

00000000 <fun>:
   0:   e2800010    add r0, r0, #16
   4:   e1a00c00    lsl r0, r0, #24
   8:   e1a00c40    asr r0, r0, #24
   c:   e12fff1e    bx  lr

I can take a really big container of assorted lego blocks, and with those blocks I could build a house, I could build a bridge, etc. The blocks dont know a house from a bridge only the human does. The processor doesnt know a bool from an int from an unsigned int, some know some different widths in that they can mask, pad, or sign extend but the human and the compiler know all and assemble the correct mixture of lego blocks in the correct order to implement their vision.

It doesn't.

On x86, the operand size is encoded as part of the instruction. Any time two differently sized operands are used, the smaller one needs to be extended to the size of the larger one first. Depending on the specific operation, this is either done using zero-extension (upper bits filled with zero) or sign-extension (upper bit of smaller value copied to upper bits of new value). For example, a program might use the movsx instruction to sign-extend a value before using it for some other operation.

On RISC architectures, usually operations apply to entire words (eg, 32 or 64 bits) at a time and it's up to the software to know how to interpret the resulting bits.

Floating point values are handled by different circuitry than integer values and as such are stored in a separate set of registers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM