简体   繁体   English

Int16 - 在.NET中的容量?

[英]Int16 - bytes capacity in.net?

Why does : 为什么 :

short a=0;
Console.Write(Marshal.SizeOf(a));

shows 2 显示2

But if I see the IL code i see : 但如果我看到IL代码,我看到:

/*1*/   IL_0000:  ldc.i4.0    
/*2*/   IL_0001:  stloc.0     
/*3*/   IL_0002:  ldloc.0     
/*4*/   IL_0003:  box         System.Int16
/*5*/   IL_0008:  call        System.Runtime.InteropServices.Marshal.SizeOf
/*6*/   IL_000D:  call        System.Console.Write

The LDC at line #1 indicates : 第1行的LDC表明:

Push 0 onto the stack as int32 . 将0作为int32推入堆栈。

So there must been 4 bytes occupied. 所以必须占用4个字节。

But sizeOf shows 2 bytes... sizeOf显示2个字节......

What am I missing here ? 我在这里错过了什么? how many byte does short actually take in mem? 短片实际上占用了多少字节?

I've heard about a situations where there is a padding to 4 bytes so it would be faster to deal with. 我听说有一个填充到4个字节的情况,所以处理起来会更快。 is it the case also here? 是这样的吗?

(please ignore the syncRoot and the GC root flag byte i'm just asking about 2 vs 4) (请忽略syncRoot和GC根标志字节,我只是询问2对4)

The CLI specification is very explicit about the data types that are allowed to be on the stack. CLI规范非常明确地允许在堆栈上的数据类型。 The short 16-bit integer is not one of them, so such types of integers are converted to 32-bit integers (4 bytes) when they are loaded onto the stack. 短16位整数不是其中之一,因此这些整数类型在加载到堆栈时会转换为32位整数(4个字节)。

Partition III.1.1 contains all of the details: 分区III.1.1包含所有细节:

1.1 Data types 1.1数据类型

While the CTS defines a rich type system and the CLS specifies a subset that can be used for language interoperability, the CLI itself deals with a much simpler set of types. 虽然CTS定义了富类型系统,而CLS指定了可用于语言互操作性的子集,但CLI本身处理的是更简单的类型集。 These types include user-defined value types and a subset of the built-in types. 这些类型包括用户定义的值类型和内置类型的子集。 The subset, collectively called the "basic CLI types", contains the following types: 该子集统称为“基本CLI类型”,包含以下类型:

  • A subset of the full numeric types ( int32 , int64 , native int , and F ). 完整数值类型的子集( int32int64native intF )。
  • Object references ( O ) without distinction between the type of object referenced. 对象引用( O ),不区分引用的对象类型。
  • Pointer types ( native unsigned int and & ) without distinction as to the type pointed to. 指针类型( native unsigned int& ),没有区分指向的类型。

Note that object references and pointer types can be assigned the value null . 请注意,可以为对象引用和指针类型赋值null This is defined throughout the CLI to be zero (a bit pattern of all-bits-zero). 这在整个CLI中定义为零(所有位为零的位模式)。

1.1.1 Numeric data types 1.1.1数字数据类型

  • The CLI only operates on the numeric types int32 (4-byte signed integers), int64 (8-byte signed integers), native int (native-size integers), and F (native-size floating-point numbers). CLI仅对数字类型int32 (4字节有符号整数), int64 (8字节有符号整数), native int (本机大小整数)和F (本机大小浮点数)进行操作。 However, the CIL instruction set allows additional data types to be implemented: 但是,CIL指令集允许实现其他数据类型:

  • Short integers: The evaluation stack only holds 4- or 8-byte integers, but other locations (arguments, local variables, statics, array elements, fields) can hold 1- or 2-byte integers. 短整数:评估堆栈仅保存4或8字节整数,但其他位置(参数,局部变量,静态,数组元素,字段)可以保存1或2字节整数。 For the purpose of stack operations the bool and char types are treated as unsigned 1-byte and 2-byte integers respectively. 出于堆栈操作的目的,bool和char类型分别被视为无符号1字节和2字节整数。 Loading from these locations onto the stack converts them to 4-byte values by: 从这些位置加载到堆栈将它们转换为4字节值:

    • zero-extending for types unsigned int8, unsigned int16, bool and char; 对于unsigned int8,unsigned int16,bool和char类型的零扩展;
    • sign-extending for types int8 and int16; 符号扩展为int8和int16类型;
    • zero-extends for unsigned indirect and element loads ( ldind.u* , ldelem.u* , etc.);; 零扩展用于无符号间接和元素加载( ldind.u*ldelem.u*等);; and
    • sign-extends for signed indirect and element loads ( ldind.i* , ldelem.i* , etc.) sign-extends用于签名的间接和元素加载( ldind.i*ldelem.i*等)

Storing to integers, booleans, and characters ( stloc , stfld , stind.i1 , stelem.i2 , etc.) truncates. 存储整数,布尔值和字符( stlocstfldstind.i1stelem.i2等)会截断。 Use the conv.ovf.* instructions to detect when this truncation results in a value that doesn't correctly represent the original value. 使用conv.ovf.*指令检测何时此截断导致的值不能正确表示原始值。

[Note: Short (ie, 1- and 2-byte) integers are loaded as 4-byte numbers on all architectures and these 4-byte numbers are always tracked as distinct from 8-byte numbers. [注意:短(即1字节和2字节)整数在所有体系结构上作为4字节数加载,并且这些4字节数始终跟踪不同于8字节数。 This helps portability of code by ensuring that the default arithmetic behavior (ie, when no conv or conv.ovf instruction is executed) will have identical results on all implementations.] 这有助于代码的可移植性,确保默认的算术行为(即,当没有执行convconv.ovf指令时)将在所有实现上具有相同的结果。

Convert instructions that yield short integer values actually leave an int32 (32-bit) value on the stack, but it is guaranteed that only the low bits have meaning (ie, the more significant bits are all zero for the unsigned conversions or a sign extension for the signed conversions). 转换产生短整数值的指令实际上会在堆栈上留下int32 (32位)值,但保证只有低位有意义(即,对于无符号转换或符号扩展,更高有效位全为零)对于签名的转换)。 To correctly simulate the full set of short integer operations a conversion to a short integer is required before the div , rem , shr , comparison and conditional branch instructions. 要正确模拟完整的短整数运算集,需要在divremshr ,比较和条件分支指令之前转换为短整数。

…and so on. …等等。

Speaking speculatively, this decision was probably made either for architectural simplicity or for speed (or possibly both). 在推测性地说,这个决定可能是为了简化建筑或者为了速度(或者两者兼而有之)。 Modern 32-bit and 64-bit processors can work more effectively with 32-bit integers than they can with 16-bit integers, and since all integers that can be represented in 2 bytes can also be represented in 4 bytes, this behavior is reasonable. 与使用16位整数相比,现代32位和64位处理器可以更有效地使用32位整数,并且因为所有可以用2个字节表示的整数也可以用4个字节表示,这种行为是合理的。

The only time it would really make sense to use a 2 byte integer as opposed to a 4 byte one is if you were more concerned with memory usage than you were with execution speed/efficiency. 唯一一次使用2字节整数而不是4字节整数是有意义的,如果你更关心内存使用而不是执行速度/效率。 And in that case, you'd need to have a whole bunch of those values, probably packed into a structure. 在这种情况下,你需要拥有一大堆这些值,可能包含在一个结构中。 And that is when you'd care about the result of Marshal.SizeOf . 那就是你关心Marshal.SizeOf的结果。

It is pretty easy to tell what's going on by taking a look at the available LDC instructions . 通过查看可用的LDC指令,很容易分辨出正在发生的事情。 Note the limited set of operand types available, there is no version available that load a constant of type short. 请注意可用的有限操作数类型集, 没有可用的类型加载short类型的常量。 Just int, long, float and double. 只是int,long,float和double。 These limitations are visible elsewhere, the Opcodes.Add instruction for example is similarly limited, no support for adding variables of one of the smaller types. 这些限制在其他地方可见,例如Opcodes.Add指令同样受限,不支持添加其中一个较小类型的变量。

The IL instruction set was very much designed intentionally this way, it reflects the capabilities of a simple 32-bit processor. IL指令集是以这种方式有意设计的,它反映了简单的32位处理器的功能。 The kind of processor to think of is the RISC kind, they had their hay-day in the nineteens. 想到的处理器类型是RISC类型,它们在九十年代就有它们的干草日。 Lots of 32-bit cpu registers that can only manipulate 32-bit integers and IEEE-754 floating point types. 许多32位cpu寄存器只能操作32位整数和IEEE-754浮点类型。 The Intel x86 core is not a good example, while very commonly used, it is a CISC design that actually supports loading and doing arithmetic on 8-bit and 16-bit operands. Intel x86内核不是一个很好的例子,虽然非常常用,但它是一种CISC设计,实际上支持在8位和16位操作数上加载和算术运算。 But that's more of a historical accident, it made mechanical translation of programs easy that started on the 8-bit 8080 and 16-bit 8086 processors. 但这更像是一次历史性事故,它使程序的机械翻译变得简单,从8位8080和16位8086处理器开始。 But such capability doesn't come for free, manipulating 16-bit values actually costs an extra cpu cycle. 但是这种功能并不是免费的,操纵16位值实际上需要额外的cpu周期。

Making IL a good match with 32-bit processor capabilities clearly makes the job of the guy implementing a jitter much simpler. 使IL与32位处理器功能完美匹配显然使得实现抖动的人的工作变得更加简单。 Storage locations can still be a smaller size, but only loads, stores and conversions need to be supported. 存储位置仍然可以更小,但只需要支持加载,存储和转换。 And only when needed, your 'a' variable is a local variable, one that occupies 32-bits on the stack frame or cpu register anyway. 只有在需要时,你的'a'变量才是一个局部变量,无论如何都要占用堆栈帧或cpu寄存器的32位。 Only stores to memory need to be truncated to the right size. 只有存储到内存需要被截断到正确的大小。

There is otherwise no ambiguity in the code snippet. 否则代码段中没有歧义。 The variable value needs to be boxed because Marshal.SizeOf() takes an argument of type object . 变量值需要加框,因为Marshal.SizeOf()接受了object类型的参数。 The boxed value identifies the type of value by the type handle, it will point to System.Int16. 盒装值通过类型句柄标识值的类型,它将指向System.Int16。 Marshal.SizeOf() has the built-in knowledge to know it takes 2 bytes. Marshal.SizeOf()具有内置的知识,知道它需要2个字节。

These restrictions do reflect on the C# language and cause inconsistency. 这些限制确实反映在C#语言上并导致不一致。 This kind of compile error forever befuddles and annoys C# programmers: 这种编译错误永远困扰并惹恼C#程序员:

    byte b1 = 127;
    b1 += 1;            // no error
    b1 = b1 + 1;        // error CS0266

Which is a result of the IL restrictions, there is no add operator that takes byte operands. 这是IL限制的结果,没有采用字节操作数的add运算符。 They need to be converted to the next larger compatible type, int in this case. 在这种情况下,它们需要转换为下一个更大的兼容类型int So it works on a 32-bit RISC processor. 因此它适用于32位RISC处理器。 Now there's a problem, the 32-bit int result needs to be hammered back into a variable that can store only 8-bits. 现在有一个问题,需要将32位的int结果重新打造成一个只能存储8位的变量。 The C# language applies that hammer itself in the 1st assignment but illogically requires a cast hammer in the 2nd assignment. C#语言在第一个任务中应用了锤子本身但在第二个任务中不合逻辑地需要一个投掷锤。

The C# language specification defines how a program should behave. C#语言规范定义了程序的行为方式。 It doesn't say how to implement this, as long as the behavior is correct. 它没有说明如何实现这一点,只要行为是正确的。 If you ask the size of a short you always get 2 . 如果你问一个short的大小你总是得到2

In practice C# compiles to CIL, where integral types smaller than 32 bits are represented as 32 bit integers on the stack 1 . 实际上,C#编译为CIL,其中小于32位的整数类型在堆栈1上表示为32位整数。

Then the JITer remaps it again to whatever is appropriate for the target hardware, typically a piece of memory on the stack or a register. 然后,JITer将其重新映射到适合目标硬件的任何内容,通常是堆栈上的一块内存或寄存器。

As long as none of these transformation changes observable behavior they're legal. 只要这些转变都没有改变可观察到的行为,它们就是合法的。

In practice size of local variables is largely irrelevant, what matters is size of arrays. 在实践中,局部变量的大小在很大程度上是无关紧要的,重要的是数组的大小。 An array of one million short s will usually occupy 2 MB. 一百万个short s的数组通常占用2 MB。


1 this is a virtual stack the IL operates on, which is different from the stack the machine code operations on. 1这是IL操作的虚拟堆栈,它与机器代码操作的堆栈不同。

CLR works natively only with 32bit and 64bit integers on the stack. CLR本身仅在堆栈上使用32位和64位整数。 The answer lies in this instruction: 答案在于这条指令:

box System.Int16

That means that the value type is boxed as Int16. 这意味着值类型被装箱为Int16。 C# compiler emits this boxing automatically to call Marshal.SizeOf(object), which in turn calls GetType() on the boxed value, which returns typeof(System.Int16). C#编译器自动发出这个装箱来调用Marshal.SizeOf(object),后者又调用盒装值上的GetType(),返回typeof(System.Int16)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM