简体   繁体   English

与Java不同,为什么拳击是.NET中的原始值类型未缓存?

[英]Why is boxing a primitive value-type in .NET uncached, unlike Java?

Consider: 考虑:

int a = 42;

// Reference equality on two boxed ints with the same value
Console.WriteLine( (object)a == (object)a ); // False

// Same thing - listed only for clarity
Console.WriteLine(ReferenceEquals(a, a));  // False

Clearly, each boxing instruction allocates a separate instance of a boxed Int32 , which is why reference-equality between them fails. 显然,每个装箱指令分配一个盒装Int32的单独实例,这就是它们之间的引用相等性失败的原因。 This page appears to indicate that this is specified behaviour: 此页面似乎表明这是指定的行为:

The box instruction converts the 'raw' (unboxed) value type into an object reference (type O). box指令将'raw'(未装箱)值类型转换为对象引用(类型O)。 This is accomplished by creating a new object and copying the data from the value type into the newly allocated object. 这是通过创建新对象并将值类型中的数据复制到新分配的对象中来实现的。

But why does this have to be the case? 但为什么会出现这种情况呢? Is there any compelling reason why the CLR does not choose to hold a "cache" of boxed Int32 s, or even stronger, common values for all primitive value-types (which are all immutable)? 有没有令人信服的理由为什么CLR不会选择为所有原始值类型(都是不可变的)保留盒装Int32的“缓存”,甚至更强的常用值? I know Java has something like this. 我知道Java有这样的东西。

In the days of no-generics, wouldn't it have helped out a lot with reducing the memory requirements as well as GC workload for a large ArrayList consisting mainly of small integers? 在没有泛型的时代,对于主要由小整数组成的大型ArrayList ,降低内存需求以及GC工作负载会不会有很大帮助? I'm also sure that there exist several modern .NET applications that do use generics, but for whatever reason (reflection, interface assignments etc.), run up large boxing-allocations that could be massively reduced with (what appears to be) a simple optimization. 我也确信的确存在许多使用泛型几个现代 .NET应用程序,但是由于各种原因(反射,接口分配等),跑起来大拳分配可与可大量减少( 似乎是)一个简单的优化。

So what's the reason? 那是什么原因? Some performance implication I haven't considered (I doubt if testing that the item is in the cache etc. will result in a net performance loss, but what do I know)? 我没有考虑过一些性能影响(我怀疑是否测试该项目是否在缓存中等将导致净性能损失,但我知道什么)? Implementation difficulties? 实施困难? Issues with unsafe code? 不安全代码的问题? Breaking backwards compatibility (I can't think of any good reason why a well-written program should rely on the existing behaviour)? 打破向后兼容性(我想不出任何比较好的理由,为什么一个精心编写的程序应该依靠现有的行为)? Or something else? 或者是其他东西?

EDIT : What I was really suggesting was a static cache of "commonly-occurring" primitives , much like what Java does . 编辑 :我真正建议的是“常见” 原语静态缓存, 就像Java所做的那样 For an example implementation, see Jon Skeet's answer. 有关示例实现,请参阅Jon Skeet的回答。 I understand that doing this for arbitrary, possibly mutable, value-types or dynamically "memoizing" instances at run-time is a completely different matter. 我知道在运行时为任意,可能是可变的值类型或动态 “memoizing”实例执行此操作是完全不同的事情。

EDIT : Changed title for clarity. 编辑 :为清晰起见改变了标题。

One reason which I find compelling is consistency. 发现引人注目的一个原因是一致性。 As you say, Java does cache boxed values in a certain range... which means it's all too easy to write code which works for a while : 正如你所说,Java 确实将盒装值缓存在一定范围内......这意味着编写一段有效的代码非常容易:

// Passes in all my tests. Shame it fails if they're > 127...
if (value1 == value2) {
    // Do something
}

I've been bitten by this - admittedly in a test rather than production code, fortunately, but it's still nasty to have something which changes behaviour significantly outside a given range. 我一直被这种方式所困扰 - 幸运的是,在测试中而不是生产代码,但是在某个给定范围之外有一些显着改变行为的东西仍然是令人讨厌的。

Don't forget that any conditional behaviour also incurs a cost on all boxing operations - so in cases where it wouldn't use the cache, you'd actually find that it was slower (because it would first have to check whether or not to use the cache). 不要忘记任何条件行为也会导致所有装箱操作的成本 - 所以在不使用缓存的情况下,你实际上发现它更慢(因为它首先必须检查是否使用缓存)。

If you really want to write your own caching box operation, of course, you can do so: 如果你真的想编写自己的缓存盒操作,当然,你可以这样做:

public static class Int32Extensions
{
    private static readonly object[] BoxedIntegers = CreateCache();

    private static object[] CreateCache()
    {
        object[] ret = new object[256];
        for (int i = -128; i < 128; i++)
        {
            ret[i + 128] = i;
        }
    }

    public object Box(this int i)
    {
        return (i >= -128 && i < 128) ? BoxedIntegers[i + 128] : (object) i;
    }
}

Then use it like this: 然后像这样使用它:

object y = 100.Box();
object z = 100.Box();

if (y == z)
{
    // Cache is working
}

I can't claim to be able to read minds, but here's a couple factors: 我不能声称能够读懂思想,但这里有几个因素:

1) caching the value types can make for unpredictability - comparing two boxed values that are equal could be true or false depending on cache hits and implementation. 1)缓存值类型会导致不可预测性 - 比较两个相等的盒装值可能是真或假,具体取决于缓存命中和实现。 Ouch! 哎哟!

2) The lifetime of a boxed value type is most likely short - so how long do you hold the value in cache? 2)盒装值类型的生命周期很可能很短 - 那么你在缓存中持有多长时间? Now you either have a lot of cached values that will no longer be used, or you need to make the GC implementation more complicated to track the lifetime of cached value types. 现在,您要么拥有许多不再使用的缓存值,要么需要使GC实现更加复杂,以跟踪缓存值类型的生命周期。

With these downsides, what is the potential win? 有了这些缺点,潜在的胜利是什么? Smaller memory footprint in an application that does a lot of long-lived boxing of equal value types. 应用程序中较小的内存占用,可以执行大量等值类型的长期装箱。 Since this win is something that is going to affect a small number of applications and can be worked around by changing code, I'm going to agree with the c# spec writer's decisions here. 由于这次胜利会影响少数应用程序并且可以通过更改代码来解决,所以我将同意c#规范编写者的决定。

Boxed value objects are not necessarily immutable. 盒装值对象不一定是不可变的。 It is possible to change the value in a boxed value type, such as through an interface. 可以更改盒装值类型中的值,例如通过接口。

So if boxing a value type always returned the same instance based on the same original value, it would create references which may not be appropriate (for example, two different value type instances which happen to have the same value end up with the same reference even though they should not). 因此,如果装箱值类型始终基于相同的原始值返回相同的实例,则会创建可能不合适的引用(例如,碰巧具有相同值的两个不同的值类型实例最终具有相同的引用甚至虽然他们不应该)。

public interface IBoxed
{
    int X { get; set; }
    int Y { get; set; }
}

public struct BoxMe : IBoxed
{
    public int X { get; set; }

    public int Y { get; set; }
}

public static void Test()
{
    BoxMe original = new BoxMe()
                        {
                            X = 1,
                            Y = 2
                        };

    object boxed1 = (object) original;
    object boxed2 = (object) original;

    ((IBoxed) boxed1).X = 3;
    ((IBoxed) boxed1).Y = 4;

    Console.WriteLine("original.X = " + original.X);
    Console.WriteLine("original.Y = " + original.Y);
    Console.WriteLine("boxed1.X = " + ((IBoxed)boxed1).X);
    Console.WriteLine("boxed1.Y = " + ((IBoxed)boxed1).Y);
    Console.WriteLine("boxed2.X = " + ((IBoxed)boxed2).X);
    Console.WriteLine("boxed2.Y = " + ((IBoxed)boxed2).Y);
}

Produces this output: 生成此输出:

original.X = 1 original.X = 1

original.Y = 2 original.Y = 2

boxed1.X = 3 boxed1.X = 3

boxed1.Y = 4 boxed1.Y = 4

boxed2.X = 1 boxed2.X = 1

boxed2.Y = 2 boxed2.Y = 2

If boxing didn't create a new instance, then boxed1 and boxed2 would have the same values, which would be inappropriate if they were created from different original value type instance. 如果装箱没有创建新实例,那么boxed1和boxed2将具有相同的值,如果它们是从不同的原始值类型实例创建的,则不合适。

There's an easy explanation for this: un/boxing is fast . 对此有一个简单的解释:un / boxing 很快 It needed to be back in the .NET 1.x days. 它需要回到.NET 1.x天。 After the JIT compiler generates the machine code for it, there's but a handful of CPU instructions generated for it, all inline without method calls. 在JIT编译器为它生成机器代码之后,只为它生成了一些CPU指令,所有内联都没有方法调用。 Not counting corner cases like nullable types and large structs. 不计算像可空类型和大结构的角落情况。

The effort of looking up a cached value would greatly diminish the speed of this code. 查找缓存值的努力将大大降低此代码的速度。

I wouldn't think a run-time-filled cache would be a good idea, but I would think it might be reasonable on 64-bit systems, to define ~8 billion of the 64 quintillion possible objects-reference values as being integer or float literals, and on any system pre-box all primitive literals. 我认为运行时填充缓存不是一个好主意,但我认为在64位系统上可能是合理的,将64个可能对象中的~80亿个 - 参考值定义为整数或浮动文字,并在任何系统预先包装所有原始文字。 Testing whether the upper 31 bits of a reference type hold some value should probably be cheaper than a memory reference. 测试引用类型的高31位是否保持某个值应该比内存引用便宜。

Adding to the answers already listed is the fact that in .net, at least with the normal garbage collector, object references are internally stored as direct pointers. 添加到已经列出的答案的事实是,在.net中,至少对于普通的垃圾收集器,对象引用在内部存储为直接指针。 This means that when a garbage collection is performed the system has to update every single reference to every object that gets moved, but it also means that "main-line" operation can be very fast. 这意味着当执行垃圾收集时,系统必须更新每个被移动的对象的每个引用,但这也意味着“主线”操作可以非常快。 If object references were sometimes direct pointers and sometimes something else, this would require extra code every time an object is dereferenced. 如果对象引用有时是直接指针,有时是其他东西,则每次取消引用对象时都需要额外的代码。 Since object dereferencing is one of the most common operations during the execution of a .net program, even a 5% slowdown here would be devastating unless it was matched by an awesome speedup. 由于对象解除引用是执行.net程序期间最常见的操作之一,因此即使在这里减速5%也是毁灭性的,除非它与令人敬畏的加速匹配。 It's possible, for example, a "64-bit compact" model, in which each object reference was a 32-bit index into an object table, might offer better performance than the existing model in which each reference is a 64-bit direct pointer. 例如,有可能是“64位紧凑​​”模型,其中每个对象引用是对象表的32位索引,可能提供比现有模型更好的性能,其中每个引用是64位直接指针。 Deferencing operations would require an extra table lookup, which would be bad, but object references would be smaller, thus allowing more of them to be stored in the cache at once. 引用操作需要额外的表查找,这会很糟糕,但是对象引用会更小,因此允许更多的表一次存储在缓存中。 In some circumstances, that could be a major performance win (maybe often enough to be worthwhile--maybe not). 在某些情况下,这可能是一次重大的表现胜利(可能经常足以值得 - 也许不是)。 It's unclear, though, that allowing an object reference to sometimes be a direct memory pointer and sometimes be something else would really offer much advantage. 但是,目前还不清楚,允许对象引用有时候是一个直接的内存指针,有时候是别的东西会提供很多优势。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM