简体   繁体   English

C#和.NET:stackalloc

[英]C# & .NET: stackalloc

I have a few questions about the functionality of the stackalloc operator. 我有一些关于stackalloc运算符功能的问题。

  1. How does it actually allocate? 它是如何实际分配的? I thought it does something like: 我认为它的确如下:

     void* stackalloc(int sizeInBytes) { void* p = StackPointer (esp); StackPointer += sizeInBytes; if(StackPointer exceeds stack size) throw new StackOverflowException(...); return p; } 

    But I have done a few tests, and I'm not sure that's how it work. 但我做了一些测试,我不确定它是如何工作的。 We can't know exactly what it does and how it does it, but I want to know the basics. 我们无法确切知道它的作用以及它是如何做到的,但我想知道基础知识。

  2. I thought that stack allocation (Well, I am actually sure about it) is faster than heap allocation. 我认为堆栈分配(好吧,我确实相信它)比堆分配快。 So why does this example: 那么为什么这个例子:

      class Program { static void Main(string[] args) { Stopwatch sw1 = new Stopwatch(); sw1.Start(); StackAllocation(); Console.WriteLine(sw1.ElapsedTicks); Stopwatch sw2 = new Stopwatch(); sw2.Start(); HeapAllocation(); Console.WriteLine(sw2.ElapsedTicks); } static unsafe void StackAllocation() { for (int i = 0; i < 100; i++) { int* p = stackalloc int[100]; } } static void HeapAllocation() { for (int i = 0; i < 100; i++) { int[] a = new int[100]; } } } 

gives the average results of 280~ ticks for stack allocation , and usually 1-0 ticks for heap allocation? 给出堆栈分配280~tits的平均结果,并且堆分配通常为1-0滴答? (On my personal computer, Intel Core i7). (在我的个人计算机上,Intel Core i7)。

On the computer I am using now (Intel Core 2 Duo), the results make more sense that the previous ones (Probably because Optimize code was not checked in VS): 460~ ticks for stack allocation , and about 380 ticks for heap allocation . 在我现在使用的计算机上(英特尔酷睿2双核处理器),结果比前面的计算机更有意义(可能是因为VS中没有检查优化代码 ): 460~用于堆栈分配的滴答 ,以及大约380个用于堆分配的滴答

But this still doesn't make sense. 但这仍然没有意义。 Why is it so? 为什么会这样? I guess that the CLR notices that we don't use the array, so maybe it doesn't even allocate it? 我想CLR注意到我们不使用数组,所以也许它甚至没有分配它?

A case where stackalloc is faster: stackalloc更快的情况:

 private static volatile int _dummy; // just to avoid any optimisations
                                         // that have us measuring the wrong
                                         // thing. Especially since the difference
                                         // is more noticable in a release build
                                         // (also more noticable on a multi-core
                                         // machine than single- or dual-core).
 static void Main(string[] args)
 {
     System.Diagnostics.Stopwatch sw1 = new System.Diagnostics.Stopwatch();
     Thread[] threads = new Thread[20];
     sw1.Start();
     for(int t = 0; t != 20; ++t)
     {
        threads[t] = new Thread(DoSA);
        threads[t].Start();
     }
     for(int t = 0; t != 20; ++t)
        threads[t].Join();
     Console.WriteLine(sw1.ElapsedTicks);

     System.Diagnostics.Stopwatch sw2 = new System.Diagnostics.Stopwatch();
     threads = new Thread[20];
     sw2.Start();
     for(int t = 0; t != 20; ++t)
     {
        threads[t] = new Thread(DoHA);
        threads[t].Start();
     }
     for(int t = 0; t != 20; ++t)
        threads[t].Join();
     Console.WriteLine(sw2.ElapsedTicks);
     Console.Read();
 }
 private static void DoSA()
 {
    Random rnd = new Random(1);
    for(int i = 0; i != 100000; ++i)
        StackAllocation(rnd);
 }
 static unsafe void StackAllocation(Random rnd)
 {
    int size = rnd.Next(1024, 131072);
    int* p = stackalloc int[size];
    _dummy = *(p + rnd.Next(0, size));
 }
 private static void DoHA()
 {
    Random rnd = new Random(1);
    for(int i = 0; i != 100000; ++i)
        HeapAllocation(rnd);
 }
 static void HeapAllocation(Random rnd)
 {
    int size = rnd.Next(1024, 131072);
    int[] a = new int[size];
    _dummy = a[rnd.Next(0, size)];
 }

Important differences between this code and that in the question: 此代码与问题中的重要区别如下:

  1. We have several threads running. 我们有几个线程在运行。 With stack allocation, they are allocating in their own stack. 使用堆栈分配,它们在自己的堆栈中进行分配。 With heap allocation, they are allocating from a heap shared with other threads. 使用堆分配,它们从与其他线程共享的堆分配。

  2. Larger sizes allocated. 分配的尺寸更大。

  3. Different sizes allocated each time (though I seeded the random generator to make the tests more deterministic). 每次分配不同的大小(虽然我播种了随机生成器以使测试更具确定性)。 This makes heap fragmentation more likely to happen, making heap allocation less efficient than with identical allocations each time. 这使得堆碎片更容易发生,使得堆分配效率低于每次使用相同分配的效率。

As well as this, it's also worth noting that stackalloc would often be used as an alternative to using fixed to pin an array on the heap. stackalloc ,还值得注意的是, stackalloc通常用作使用fixedfixed堆上的数组的替代方法。 Pinning arrays is bad for heap performance (not just for that code, but also for other threads using the same heap), so the performance impact would be even greater then, if the claimed memory would be in use for any reasonable length of time. 固定数组不利于堆性能(不仅对于该代码,而且对于使用相同堆的其他线程),因此如果声明的内存将在任何合理的时间长度内使用,性能影响将更大。

While my code demonstrates a case where stackalloc gives a performance benefit, that in the question is probably closer to most cases where someone might eagerly "optimise" by using it. 虽然我的代码演示了一个stackalloc提供性能优势的情况,但问题中的问题可能更接近于大多数人可能会急切地“优化”使用它的情况。 Hopefully the two pieces of code together show that whole stackalloc can give a boost, it can also hurt performance a lot too. 希望这两段代码一起显示整个stackalloc可以提升,它也可以损害性能。

Generally, you shouldn't even consider stackalloc unless you are going to need to use pinned memory for interacting with unmanaged code anyway, and it should be considered an alternative to fixed rather than an alternative to general heap allocation. 通常,您甚至不应该考虑使用stackalloc除非您stackalloc都需要使用固定内存来与非托管代码进行交互,并且应该将其视为fixed而非替代常规堆分配的替代方法。 Use in this case still requires caution, forethought before you start, and profiling after you finish. 在这种情况下使用仍然需要谨慎,在开始之前需要预先考虑,并在完成后进行分析。

Use in other cases could give a benefit, but it should be far down the list of performance improvements you would try. 在其他情况下使用可能会带来好处,但它应该远远低于您尝试的性能改进列表。

Edit: 编辑:

To answer part 1 of the question. 回答问题的第1部分。 Stackalloc is conceptually much as you describe. Stackalloc在概念上与您描述的一样多。 It obtains a chunk of the stack memory, and then returns a pointer to that chunk. 它获取堆栈内存的一大块,然后返回指向该块的指针。 It doesn't check the memory will fit as such, but rather if it attempts to obtain memory into the end of the stack - which is protected by .NET on thread creation - then this will cause the OS to return an exceptioin to the runtime, which it then turns into a .NET managed exception. 它没有检查内存是否适合这样,但是如果它试图获取内存到堆栈的末尾 - 这在创建线程时受.NET保护 - 那么这将导致操作系统返回运行时的异常,然后它变成.NET托管异常。 Much the same happens if you just allocate a single byte in a method with infinite recursion - unless the call got optimised to avoid that stack allocation (sometimes possible), then a single byte will eventually add up to enough to trigger the stack overflow exception. 如果你只是在一个具有无限递归的方法中分配一个字节,就会发生同样的情况 - 除非调优得到优化以避免堆栈分配(有时可能),然后单个字节最终会加起来足以触发堆栈溢出异常。

  1. I can't give an exact answer but stackalloc is implemented using the IL opcode localloc . 我无法给出确切的答案,但stackalloc是使用IL操作码localloc I looked at the machine code generated by a release build for stackalloc and it was more convoluted than I expected. 我查看了stackalloc的发布版本生成的机器代码,它比我预期的更复杂。 I don't know if localloc will check the stack size as you indicate by your if or if the stack overflow is detected by the CPU when the hardware stack actually overflows. 我不知道localloc是否会检查堆栈大小,因为你指示if或者当硬件堆栈实际溢出时CPU是否检测到堆栈溢出。

    The comments to this answer indicate that the link provided to localloc allocates space from "the local heap". 对此答案的评论表明,提供给localloc的链接从“本地堆”分配空间。 The problem is that there is no good online reference for MSIL except the actual standard available in PDF format. 问题是,除了PDF格式的实际标准外,MSIL没有良好的在线参考。 The link above is from the System.Reflection.Emit.OpCodes class which isn't about MSIL but rather a library for generating MSIL. 上面的链接来自System.Reflection.Emit.OpCodes类,该类与MSIL无关,而是用于生成MSIL的库。

    However, in the standards document ECMA 335 - Common Language Infrastructure there is a more precise description: 但是,在标准文件ECMA 335 - 公共语言基础设施中,有一个更精确的描述:

    Part of each method state is a local memory pool. 每个方法状态的一部分是本地内存池。 Memory can be explicitly allocated from the local memory pool using the localloc instruction. 可以使用localloc指令从本地内存池中显式分配内存。 All memory in the local memory pool is reclaimed on method exit, and that is the only way local memory pool memory is reclaimed (there is no instruction provided to free local memory that was allocated during this method invocation). 在方法出口处回收本地内存池中的所有内存,这是回收本地内存池内存的唯一方法(没有提供释放在此方法调用期间分配的本地内存的指令)。 The local memory pool is used to allocate objects whose type or size is not known at compile time and which the programmer does not wish to allocate in the managed heap. 本地内存池用于分配在编译时未知类型或大小的对象,以及程序员不希望在托管堆中分配的对象。

    So basically the "local memory pool" is what is otherwise known as "the stack" and the C# language uses the stackalloc operator to allocate from this pool. 所以基本上“本地内存池”就是所谓的“堆栈”,C#语言使用stackalloc运算符从这个池中分配。

  2. In a release build the optimizer is smart enough to completely remove the call to HeapAllocation resulting in much lower execution time. 在发布版本中,优化器足够智能,可以完全删除对HeapAllocation的调用,从而大大降低执行时间。 It seems that it isn't smart enough to perform the same optimization when using stackalloc . 在使用stackalloc时,似乎不够聪明,无法执行相同的优化。 If you either turn off optimization or in some way uses the allocated buffer you will see that stackalloc is slightly faster. 如果您关闭优化或以某种方式使用分配的缓冲区,您将看到stackalloc稍快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM