简体   繁体   English

C#结构分配效率

[英]C# struct assignment efficiency

In C#, I have an array of structs and I need to assign values to each. 在C#中,我有一个结构数组,我需要为每个结构赋值。 What is the most efficient way to do this? 最有效的方法是什么? I could assign each field, indexing the array for each field: 我可以分配每个字段,为每个字段索引数组:

array[i].x = 1;
array[i].y = 1;

I could construct a new struct on the stack and copy it to the array: 我可以在堆栈上构造一个新的结构并将其复制到数组中:

array[i] = new Vector2(1, 2);

Is there another way? 还有另外一种方法吗? I could call a method and pass the struct by ref, but I'd guess the method call overhead would not be worth it. 我可以调用一个方法并通过ref传递结构,但我猜测方法调用开销不值得。

In case the struct size matters, the structs in question have 2-4 fields of type float or byte. 如果结构大小很重要,那么所讨论的结构有2-4个float或byte类型的字段。

In some cases I need to assign the same values to multiple array entries, eg: 在某些情况下,我需要为多个数组条目分配相同的值,例如:

Vector2 value = new Vector2(1, 2);
array[i] = value;
array[i + 1] = value;
array[i + 2] = value;
array[i + 3] = value;

Does this change which approach is more efficient? 这会改变哪种方法更有效吗?

I understand this is quite low level, but I'm doing it millions of times and I'm curious. 我知道这是相当低的水平,但我做了数百万次,我很好奇。

Edit: I slapped together a benchmark: 编辑:我打了一个基准:

this.array = new Vector2[100];
Vector2[] array = this.array;
for (int i = 0; i < 1000; i++){
    long startTime, endTime;
    startTime = DateTime.Now.Ticks;
    for (int x = 0; x < 100000000; x++) {
        array[0] = new Vector2(1,2);
        array[1] = new Vector2(3,4);
        array[2] = new Vector2(5,6);
        array[3] = new Vector2(7,8);
        array[4] = new Vector2(9,0);
        array[5] = new Vector2(1,2);
        array[6] = new Vector2(3,4);
        array[7] = new Vector2(5,6);
        array[8] = new Vector2(7,8);
        array[9] = new Vector2(9,0);
    }
    endTime = DateTime.Now.Ticks;
    double ns = ((double)(endTime - startTime)) / ((double)loopCount);
    Debug.Log(ns.ToString("F"));
}

This reported ~0.77ns and another version which indexed and assigned the struct fields gave ~0.24ns, FWIW. 这报告了〜0.77ns,另一个索引并分配了结构域的版本给出了~0.24ns,FWIW。 It appears the array index is cheap compared to the struct stack allocation and copy. 与结构堆栈分配和复制相比,数组索引看起来很便宜。 Might be interesting to see the performance on a mobile device. 在移动设备上看到性能可能会很有趣。

Edit2: Dan Bryant's answer below is why I didn't write a benchmark to begin with, too easy to get wrong. 编辑2:丹·布莱恩特下面的回答是为什么我没有写一个基准开始,太容易出错。

I was curious about the first case (field assignment vs. constructor call), so I made a release build and attached post-JIT to see the disassembly. 我很好奇第一种情况(字段赋值与构造函数调用),所以我做了一个发布版本并附加了post-JIT以查看反汇编。 The (x64) code looks like this: (x64)代码如下所示:

            var array = new Vector2[10];
00000000  mov         ecx,191372h 
00000005  mov         edx,0Ah 
0000000a  call        FFF421C4 
0000000f  mov         edx,eax 

            array[i].x = 1;
00000011  cmp         dword ptr [edx+4],0 
00000015  jbe         0000003E 
00000017  lea         eax,[edx+8] 
0000001a  fld1 
0000001c  fstp        qword ptr [eax] 
            array[i].y = 1;
0000001e  fld1 
00000020  fstp        qword ptr [edx+10h] 

            array[i] = new Vector2(1, 1);
00000023  add         edx,8 
00000026  mov         eax,edx 
00000028  fld1 
0000002a  fld1 
0000002c  fxch        st(1) 
0000002e  fstp        qword ptr [eax] 
00000030  fstp        qword ptr [eax+8] 

One thing worth noting is that the 'constructor call' is inlined when using a release build outside the debugger, so, in principle, there should be no difference between setting fields or calling the constructor. 值得注意的是,在调试器外部使用发布版本时会内联“构造函数调用”,因此,原则上,设置字段或调用构造函数之间应该没有区别。 That said, the jitter did some interesting things here. 也就是说,抖动在这里做了一些有趣的事情。

For the 'constructor' version, it used two floating point stack slots and stores them at the same time to the structure memory (fld1, fld1, fstp, fstp.) It also has an fxch (exchange), which is a bit silly since both slots contain constant value 1, but not exactly a high priority optimization target for most applications, I'd assume. 对于'构造函数'版本,它使用两个浮点堆栈槽并将它们同时存储到结构存储器(fld1,fld1,fstp,fstp。)它还有一个fxch(交换),这有点傻对于大多数应用程序,两个插槽都包含常量值1,但不是完全高优先级的优化目标,我假设。

For the 'individual fields' version, it only used one slot on the FPU stack, by splitting up the writes (fld1, fstp, fld1, fstp). 对于“单个字段”版本,它只使用FPU堆栈上的一个插槽,通过拆分写入(fld1,fstp,fld1,fstp)。 I'm not an x64 guru, so I don't know which ordering is more efficient in terms of execution time. 我不是x64大师,所以我不知道哪个排序在执行时间方面更有效率。 Any difference is probably quite miniscule, though, since the primary potential overhead (constructor method call) is inlined out. 但是,任何差异可能都是微不足道的,因为主要的潜在开销(构造函数方法调用)是内联的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM