简体   繁体   English

.NET中Generic和非泛型集合之间的内存使用差异

[英]Memory usage difference between Generic and Non-generic collections in .NET

I read about collections in .NET nowadays. 我现在读到.NET中的集合 As known, there is some advantages using generic collections over non-generic : they are type-safety and there is no casting, no boxing/unboxing. 众所周知,使用泛型集合 泛型集合有一些优点:它们是类型安全的,没有强制转换,没有装箱/拆箱。 That's why generic collections have a performance advantage. 这就是通用集合具有性能优势的原因。

If we consider that non-generic collections store every member as object , then we can think that generics have also memory advantage. 如果我们认为非泛型集合将每个成员存储为object ,那么我们可以认为泛型也具有内存优势。 However, I didn't found any information about memory usage difference. 但是,我没有找到任何有关内存使用差异的信息。

Can anyone clarify about the point? 有人可以澄清这一点吗?

If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. 如果我们认为非泛型集合将每个成员存储为对象,那么我们可以认为泛型也具有内存优势。 However, I didn't found any information about memory usage difference. 但是,我没有找到任何有关内存使用差异的信息。 Can anyone clarify about the point? 有人可以澄清这一点吗?

Sure. 当然。 Let's consider an ArrayList that contains int s vs a List<int> . 让我们考虑一个包含int s和List<int>ArrayList Let's suppose there are 1000 int s in each list. 假设每个列表中有1000个int

In both, the collection type is a thin wrapper around an array -- hence the name ArrayList . 在两者中,集合类型是数组周围的薄包装 - 因此名称为ArrayList In the case of ArrayList , there's an underlying object[] that contains at least 1000 boxed ints. ArrayList的情况下,有一个底层object[]包含至少1000个盒装整数。 In the case of List<int> , there's an underlying int[] that contains at least 1000 int s. List<int>的情况下,有一个底层的int[]包含至少1000个int

Why did I say "at least"? 为什么我说“至少”? Because both use a double-when-full strategy. 因为两者都使用双倍完整策略。 If you set the capacity of a collection when you create it then it allocates enough space for that many things. 如果在创建集合时设置集合的容量,则会为该集合分配足够的空间。 If you don't, then the collection has to guess, and if it guesses wrong and you need more capacity, then it doubles its capacity. 如果你不这样做,那么集合必须猜测,如果猜错了,你需要更多的容量,那么它的容量就会增加一倍。 So, best case, our collection arrays are exactly the right size. 所以,最好的情况是,我们的集合数组大小合适。 Worst case, they are possibly twice as big as they need to be; 最糟糕的情况是,它们可能是它们需要的两倍; there could be room for 2000 objects or 2000 ints in the arrays. 数组中可能有2000个对象或2000个整数的空间。

But let's suppose for simplicity that we're lucky and there are about 1000 in each. 但是,为简单起见,我们很幸运,每个人大约有1000个。

To start with, what's the memory burden of just the array? 首先,只是阵列的内存负担是什么? An object[1000] takes up 4000 bytes on a 32 bit system and 8000 bytes on a 64 bit system, just for the references, which are pointer sized. object[1000]在32位系统上占用4000个字节,在64位系统上占用8000个字节,仅用于指针大小的引用。 An int[1000] takes up 4000 bytes regardless. 无论如何, int[1000]占用4000个字节。 (There are also a few extra bytes taken up by array bookkeeping, but these costs are small compared to the marginal costs.) (阵列簿记也占用了一些额外的字节,但与边际成本相比,这些成本很小。)

So already we see that the non-generic solution possibly consumes twice as much memory just for the array. 因此,我们已经看到非通用解决方案可能仅消耗两倍于阵列的内存。 What about the contents of the array? 那个数组的内容怎么样?

Well, the thing about value types is they are stored right there in their own variable . 那么,关于值类型的事情是它们存储在它们自己的变量中 There is no additional space beyond those 4000 bytes used to store the 1000 integers; 除了用于存储1000个整数的4000个字节之外,没有额外的空间; they get packed right into the array. 他们被打包到阵列中。 So the additional cost is zero for the generic case. 因此,通用案例的额外成本为零。

For the object[] case, each member of the array is a reference, and that reference refers to an object; 对于object[]情况,数组的每个成员都是一个引用,该引用引用一个对象; in this case, a boxed integer. 在这种情况下,一个盒装整数。 What's the size of a boxed integer? 盒装整数的大小是多少?

An unboxed value type doesn't need to store any information about its type, because its type is determined by the type of the storage its in, and that's known to the runtime. 未装箱的值类型不需要存储有关其类型的任何信息,因为其类型由其所在的存储类型以及运行时已知的类型确定。 A boxed value type needs to somewhere store the type of the thing in the box, and that takes space. 盒装值类型需要在某处存储框中事物的类型,并占用空间。 It turns out that the bookkeeping overhead for an object in 32 bit .NET is 8 bytes, and 16 on 64 bit systems. 事实证明,32位.NET中对象的簿记开销是8字节,64位系统上是16。 That's just the overhead; 这只是开销; we of course need 4 bytes for the int. 我们当然需要4个字节用于int。 But wait, it gets worse: on 64 bit systems, the box must be aligned to an 8 byte boundary, so we need another 4 bytes of padding on 64 bit systems. 但等等,情况变得更糟:在64位系统上,盒子必须与8字节边界对齐,所以我们需要64位系统上另外 4个字节的填充。

Add it all up: Our int[] takes about 4KB on both 64 and 32 bit systems. 全部添加:我们的int[]在64位和32位系统上大约需要4KB。 Our object[] containing 1000 ints takes about 16KB on 32 bit systems, and 32K on 64 bit systems. 我们的包含1000个int的object[]在32位系统上大约需要16KB,在64位系统上需要32K。 So the memory efficiency of an int[] vs an object[] is either 4 or 8 times worse for the non-generic case. 因此,对于非通用情况, int[]object[]的内存效率要差4或8倍。

But wait, it gets even worse. 但等等,情况变得更糟。 That's just size. 那只是尺寸。 What about access time? 访问时间怎么样?

To access an integer from an array of integers, the runtime must: 要从整数数组中访问整数,运行时必须:

  • verify that the array is valid 验证数组是否有效
  • verify that the index is valid 验证索引是否有效
  • fetch the value from the variable at the given index 从给定索引处的变量中获取值

To access an integer from an array of boxed integers, the runtime must: 要从盒装整数数组中访问整数,运行时必须:

  • verify that the array is valid 验证数组是否有效
  • verify that the index is valid 验证索引是否有效
  • fetch the reference from the variable at the given index 从给定索引处的变量中获取引用
  • verify that the reference is not null 验证引用不为null
  • verify that the reference is a boxed integer 验证引用是否为盒装整数
  • extract the integer from the box 从框中提取整数

That's a lot more steps, so it takes a lot longer. 这是更多的步骤,因此需要更长的时间。

BUT WAIT IT GETS WORSE. 但等待它会变得更糟糕。

Modern processors use caches on the chip itself to avoid going back to main memory. 现代处理器在芯片本身上使用缓存以避免返回主存储器。 An array of 1000 plain integers is highly likely to end up in the cache so that accesses to the first, second, third, etc, members of the array in quick succession are all pulled from the same cache line; 1000个普通整数的数组很可能最终进入高速缓存,以便快速连续地对数组的第一个,第二个,第三个等成员的访问全部从同一个高速缓存行中提取; this is insanely fast . 太疯狂了 But boxed integers can be all over the heap, which increases cache misses, which greatly slows down access even further. 但是盒装整数可以遍布整个堆,这会增加缓存未命中率,这会进一步降低访问速度。

Hopefully that sufficiently clarifies your understanding of the boxing penalty. 希望这足以澄清你对拳击惩罚的理解。

What about non-boxed types? 那些非盒装类型呢? Is there a significant difference between an array list of strings, and a List<string> ? 数组字符串列表和List<string>之间是否存在显着差异?

Here the penalty is much, much smaller, since an object[] and a string[] have similar performance characteristics and memory layouts. 这里的惩罚要小得多,因为object[]string[]具有相似的性能特征和内存布局。 The only additional penalty in this case is (1) not catching your bugs until runtime, (2) making the code harder to read and edit, and (3) the slight penalty of a run-time type check. 在这种情况下唯一的额外惩罚是(1)在运行时没有捕获你的错误,(2)使代码更难阅读和编辑,以及(3)运行时类型检查的轻微惩罚。

then we can think that generics have also memory advantage 那么我们可以认为仿制药也具有记忆优势

This assumption is false, it only applies on value-types. 这种假设是错误的,它只适用于价值类型。 So considder this: 所以考虑到这个:

new ArrayList { 1, 2, 3 };

This will implicetly cast every integer into object (known as boxing) in order to store it into your ArrayList . 这将隐含地将每个整数转换为object (称为装箱),以便将其存储到ArrayList This will cause your memory-overhead here, because an object surely is bigger than a simple int . 这会导致你的内存开销,因为一个object肯定比一个简单的int

For reference-types there´s no difference however as there´s no need for boxing. 对于参考类型没有区别,因为不需要拳击。

Using the one or the other shouldn´t be driven bei neither any performance- nor memory-issues. 使用这一个或另一个不应该被驱动,既不是任何性能问题也不是内存问题。 However you should ask yourself what you want to do with the results. 但是你应该问问自己你想对结果做些什么。 In particular if you know the type(s) stored in your collection at compile-time, there´s no reason to not put this information into the compile-process by using the right generic type-argument. 特别是如果您在编译时知道集合中存储的类型,则没有理由使用正确的泛型类型参数将此信息放入编译过程中。

Anyway you should allways use generic collections instead of non-generic ones because of the mentioned type-safety. 无论如何,由于提到的类型安全性,你应该总是使用泛型集合而不是非泛型集合。

EDIT: Your actual question if using a non-generic collection or a generic version is quite pointless: allways use the generic one. 编辑:如果使用非泛型集合或通用版本,您的实际问题是毫无意义的:总是使用通用的。 But not because of its memory-usage. 但不是因为它的内存使用。 See this: 看到这个:

ArrayList a = new ArrayList { 1, 2, 3};

vs.

List<object> a = new List<object> { 1, 2, 3 };

Both lists will consume same amount of memory, although the second one is generic. 两个列表将消耗相同数量的内存, 尽管第二个列表是通用的。 That´s because they both box your integers into object . 这是因为他们都把你的整数打成了object So the answer to the question has nothing to do with memory. 所以这个问题的答案与记忆无关。

On te other saying for reference-types there´s no memory-differencee at all: 另外说参考类型没有内存差异:

ArrayList a = new ArrayList { myInstance, anotherInstance }

vs.

List<MyClass> a = new List<MyClass> { myInstance, anotherInstance }

will produce the same memory-outcome. 将产生相同的记忆结果。 However the second one is far easier to maintain as you can work with the instances directly without casting them. 然而,第二个更容易维护,因为您可以直接使用实例而不需要转换它们。

Lets assume we have this statement : 让我们假设我们有这样的声明:

int valueType = 1;

so now we have a value on the stack as follows : 所以现在我们在堆栈上有一个值如下:

stack

i = 1

Now consider we do this now : 现在考虑我们现在这样做:

object boxingObject = valueType;

Now we have two values stored in the memory, the reference for valueType in the stack and the value 1 in the heap: 现在我们在内存中存储了两个值,堆栈中valueType的引用和堆中的value 1

stack

boxingObject

heap

1

So in case of boxing a value type there will be extra usage for memory as Microsoft docs states : 因此,在装箱值类型的情况下,将有额外的内存使用,如Microsoft文档所述:

Boxing a value type allocates an object instance on the heap and copies the value into the new object. 装箱值类型在堆上分配对象实例并将值复制到新对象中。

See this link for full information. 有关完整信息,请参阅此链接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM