简体   繁体   English

元素大小会影响C#集合的性能吗?

[英]element size influencing C# collection performance?

Given the task to improve the performance of a piece of code, I have came across the following phenomenon. 在完成改善一段代码性能的任务后,我遇到了以下现象。 I have a large collection of reference types in a generic Queue and I'm removing and processing the element one by one, then add them to another generic collection. 我在泛型队列中有大量的引用类型集合,并且要逐一删除和处理元素,然后将它们添加到另一个泛型集合中。

It seems the larger the elements are the more time it takes to add the element to the collection. 似乎元素越大,将元素添加到集合中所花费的时间就越多。

Trying to narrow down the problem to the relevant part of the code, I've written a test (omitting the processing of elements, just doing the insert): 为了将问题缩小到代码的相关部分,我编写了一个测试(省略对元素的处理,仅执行插入操作):

    class Small 
    {
        public Small()
        {
            this.s001 = "001";
            this.s002 = "002";
        }
        string s001;
        string s002;
    }


    class Large
    {
        public Large()
        {
            this.s001 = "001";
            this.s002 = "002";
            ...
            this.s050 = "050";

        }
        string s001;
        string s002;
        ...
        string s050;
    }

    static void Main(string[] args)
    {
        const int N = 1000000;
        var storage = new List<object>(N);
        for (int i = 0; i < N; ++i)
        {
            //storage.Add(new Small());
            storage.Add(new Large());
        }

        List<object> outCollection = new List<object>();
        Stopwatch sw = new Stopwatch();

        sw.Start();
        for (int i = N-1; i > 0; --i)
        {          
            outCollection.Add(storage[i];);
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds);
    }

On the test machine, using the Small class, it takes about 25-30 ms to run, while it takes 40-45 ms with Large. 在使用Small类的测试机上,运行大约需要25-30毫秒,而使用Large类则需要40-45毫秒。 I know that the outCollection has to grow from time to time to be able to store all the items, so there is some dynamic memory allocation. 我知道outCollection必须不时地增长以便能够存储所有项目,因此存在一些动态内存分配。 But given an initial collection size even makes the difference more obvious: 11-12 ms with Small and 35-38 ms with Large objects. 但是给定初始收集大小,差异甚至更加明显:“小型”对象为11-12毫秒,大型对象为35-38毫秒。

I am somewhat surprised, as these are reference types, so I was expecting the collections to work only with references to the Small/Large instances. 我有点惊讶,因为它们是引用类型,所以我期望这些集合仅对Small / Large实例的引用起作用。 I have read Eric Lippert's relevant article that and know that references should not be treated as pointers. 我已经阅读了Eric Lippert的相关文章 ,并且知道不应将引用视为指针。 At the same time, AFAIK currently they are implemented as pointers and their size and the collection's performance should be independent of element size. 同时,AFAIK当前将它们实现为指针,并且它们的大小以及集合的性能应独立于元素大小。

I've decided to put up a question here hoping that someone could explain or help me to understand what's happening here. 我决定在这里提出一个问题,希望有人可以解释或帮助我了解这里发生的事情。 Aside the performance improvement, I'm really curious what is happening behind the scenes. 除了性能提高外,我真的很好奇幕后发生的事情。

Update: Profiling data using the diagnostic tools didn't help me much, although I have to admit I'm not an expert using the profiler. 更新:尽管必须承认我不是使用探查器的专家,但使用诊断工具对数据进行分析并没有多大帮助。 I'll collect more data later today to find where the bottleneck is. 今天晚些时候,我将收集更多数据以查找瓶颈所在。

The pressure on the GC is quite high of course, especially with the Large instances. GC上的压力当然很高,尤其是对于Large实例。 But once the instances are created and stored in the storage collection, and the program enters the loop, there was no collection triggered any more, and memory usage hasn't increased significantly ( outCollction already pre-allocated). 但是一旦创建了实例并将其存储在storage集合中,并且程序进入循环,就不会再触发任何集合,并且内存使用也不会显着增加( outCollction已经预先分配)。

Most of the CPU time is of course spent with memory allocation (JIT_New), around 62% and the only other significant entry is Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % Module Name System.Collections.Generic.List`1[System.__Canon].Add with about 7%. 当然,大部分CPU时间都花费在内存分配(JIT_New)上,大约占62%,唯一的其他重要条目是函数名称包含样本专有样本包含样本%专有样本%模块名称System.Collections.Generic.List`1 [ System .__ Canon]。添加约7%。

With 1 million items the preallocated outCollection size is 8 million bytes (the same as the size of storage ); 对于100万个项目,预分配的outCollection大小为800万个字节(与storage大小相同); one can suspect 64 bit addresses being stored in the collections. 人们可能会怀疑64位地址存储在集合中。

Probably I'm not using the tools properly or don't have the experience to interpret the results correctly, but the profiler didn't help me to get closer to the cause. 可能是我没有正确使用工具,或者没有正确解释结果的经验,但是探查器并没有帮助我更深入地了解原因。 If the loop is not triggering collections and it only copies pointers between 2 pre-allocated collections, how could the item size cause any difference? 如果循环未触发集合,并且仅在两个预分配的集合之间复制指针,那么项目大小如何引起任何差异? Cache hit/miss ratio is supposed to be the more or less the same in both cases, as the loop is iteration over a list of "addresses" in both cases. 在这两种情况下,高速缓存命中率/未命中率应该大致相同,因为在这两种情况下,循环都是在“地址”列表上进行迭代。

Thanks for all the help so far, I will collect more data, and put an update here if anything found. 感谢到目前为止的所有帮助,我将收集更多数据,如果发现任何问题,请在此处进行更新。

I suspect that at least one action in the above (maybe some type checks) will require a de-reference. 怀疑以上至少一项操作(也许是类型检查)将需要取消引用。 Then the fact that many Small s are probably sat close together on the heap and thus sharing cache lines could account for some amount of difference (certainly many more of them could share a single cache line than Large s). 然后,许多Small可能并排放置在堆上,因此共享高速缓存行这一事实可能会造成一定程度的差异(当然,与Large相比,它们中的更多可以共享一条高速缓存行)。

Added to which you are also accessing them in the reverse order in which they were allocated which maximises such a benefit. 此外,您还可以按照分配它们的相反顺序访问它们,以最大程度地获得这种好处。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM