[英]Inconsistent ram write speeds across different size array and method of writing. Fastest way to fill array?
I've encountered some inconsistencies in writing speed across different array sizes and methods, so i made a separate program to investigate, its a simple c# console application which creates a byte, int and float array of a specified size, then fills it with a defined value using different methods and measures write speed, in between each method call it also clears the cpu cache for unadulterated results when running the same method n times.我在不同的数组大小和方法中遇到了一些写入速度不一致的问题,所以我制作了一个单独的程序来调查,它是一个简单的 c# 控制台应用程序,它创建一个指定大小的字节、整数和浮点数组,然后用一个使用不同的方法定义值并测量写入速度,在每次方法调用之间,当运行相同的方法 n 次时,它还会清除 cpu 缓存以获得纯正的结果。
Here is the pastebin dump of the program.cs the only file in a simple program, just copy paste it into a new empty console application in visual studio这是程序的 pastebin 转储。cs是一个简单程序中唯一的文件,只需将其复制粘贴到 visual studio 中的一个新的空控制台应用程序中
The byte/int arrays are meant to be bitmap data, and the float array a depthbuffer, so the byteframe is width * height * 4 for bgra, and the int/float frame is just width * height in size. byte/int arrays 意味着是 bitmap 数据,float 数组是一个深度缓冲区,所以 bgra 的字节帧是 width * height * 4,而 int/float 帧的大小只是 width * height。
This is the results i get on my computer:这是我在电脑上得到的结果:
500*500
ClearByteFrameAsSpanFill: MS:0,05 GB/s:18,79 MB/s:18794,50
ClearByteFrameArrayFill: MS:0,05 GB/s:18,54 MB/s:18541,56
ClearByteFrameAsByteAVX: MS:0,05 GB/s:19,57 MB/s:19575,94
ClearByteFrameAsByteAVXUnRolled: MS:0,05 GB/s:18,14 MB/s:18148,42
ClearByteFrameAsByteAVXThreaded(2): MS:0,08 GB/s:11,96 MB/s:11968,89
ClearByteFrameAsByteAVXThreaded(4): MS:0,08 GB/s:12,19 MB/s:12192,34
ClearByteFrameAsByteAVXThreaded(6): MS:0,08 GB/s:12,53 MB/s:12537,63
ClearByteFrameAsByteAVXThreaded(8): MS:0,08 GB/s:12,37 MB/s:12377,25
ClearByteFrameAsByteAVXThreaded(12): MS:0,08 GB/s:12,13 MB/s:12137,08
ClearByteFrameMarshalCopy: MS:0,09 GB/s:10,35 MB/s:10357,00
ClearByteFrameMarshalCopy4: MS:0,11 GB/s:9,22 MB/s:9220,86
ClearByteFrameMarshalThreaded(2): MS:0,14 GB/s:7,17 MB/s:7171,80
ClearByteFrameMarshalThreaded(4): MS:0,12 GB/s:7,89 MB/s:7892,86
ClearByteFrameMarshalThreaded(6): MS:0,12 GB/s:7,99 MB/s:7993,24
ClearByteFrameMarshalThreaded(8): MS:0,13 GB/s:7,28 MB/s:7287,14
ClearByteFrameMarshalThreaded(12): MS:0,13 GB/s:7,48 MB/s:7485,39
ClearByteFrameNaive: MS:0,30 GB/s:3,30 MB/s:3308,06
ClearByteFrameBytePointer: MS:0,25 GB/s:3,97 MB/s:3970,49
ClearByteFrameInt32Pointer: MS:0,07 GB/s:13,98 MB/s:13981,94
ClearByteFrameInt32PointerThreads(2): MS:0,07 GB/s:12,58 MB/s:12582,46
ClearByteFrameInt32PointerThreads(4): MS:0,08 GB/s:11,93 MB/s:11938,29
ClearByteFrameInt32PointerThreads(6): MS:0,08 GB/s:12,14 MB/s:12144,36
ClearByteFrameInt32PointerThreads(8): MS:0,08 GB/s:11,62 MB/s:11629,97
ClearByteFrameInt32PointerThreads(12): MS:0,09 GB/s:11,38 MB/s:11388,57
ClearByteFrameInt64Pointer: MS:0,05 GB/s:18,69 MB/s:18692,43
ClearByteFrameInt64PointerThreads(2): MS:0,07 GB/s:12,91 MB/s:12917,26
ClearByteFrameInt64PointerThreads(4): MS:0,08 GB/s:12,47 MB/s:12477,97
ClearByteFrameInt64PointerThreads(6): MS:0,10 GB/s:11,43 MB/s:11439,57
ClearByteFrameInt64PointerThreads(8): MS:0,08 GB/s:11,50 MB/s:11504,56
ClearByteFrameInt64PointerThreads(12): MS:0,09 GB/s:11,05 MB/s:11052,96
ClearInt32FrameAsSpanFill: MS:0,05 GB/s:19,25 MB/s:19256,44
ClearInt32FrameArrayFill: MS:0,05 GB/s:18,80 MB/s:18803,56
ClearFloatFrameAsSpanFill: MS:0,05 GB/s:19,24 MB/s:19245,53
ClearFloatFrameArrayFill: MS:0,05 GB/s:19,29 MB/s:19296,62
-- --
1500*1500
ClearByteFrameAsSpanFill: MS:0,47 GB/s:18,96 MB/s:18963,17
ClearByteFrameArrayFill: MS:0,48 GB/s:18,63 MB/s:18638,61
ClearByteFrameAsByteAVX: MS:0,46 GB/s:19,54 MB/s:19547,95
ClearByteFrameAsByteAVXUnRolled: MS:0,47 GB/s:18,98 MB/s:18987,69
ClearByteFrameAsByteAVXThreaded(2): MS:0,55 GB/s:16,71 MB/s:16715,60
ClearByteFrameAsByteAVXThreaded(4): MS:0,53 GB/s:17,09 MB/s:17097,25
ClearByteFrameAsByteAVXThreaded(6): MS:0,53 GB/s:17,10 MB/s:17106,20
ClearByteFrameAsByteAVXThreaded(8): MS:0,48 GB/s:19,06 MB/s:19066,60
ClearByteFrameAsByteAVXThreaded(12): MS:0,51 GB/s:17,75 MB/s:17759,48
ClearByteFrameMarshalCopy: MS:0,81 GB/s:11,05 MB/s:11059,42
ClearByteFrameMarshalCopy4: MS:0,80 GB/s:11,20 MB/s:11207,69
ClearByteFrameMarshalThreaded(2): MS:0,83 GB/s:11,14 MB/s:11142,98
ClearByteFrameMarshalThreaded(4): MS:0,79 GB/s:11,49 MB/s:11494,85
ClearByteFrameMarshalThreaded(6): MS:0,87 GB/s:10,37 MB/s:10372,56
ClearByteFrameMarshalThreaded(8): MS:0,85 GB/s:10,69 MB/s:10692,81
ClearByteFrameMarshalThreaded(12): MS:0,86 GB/s:10,49 MB/s:10497,28
ClearByteFrameNaive: MS:2,67 GB/s:3,35 MB/s:3359,32
ClearByteFrameBytePointer: MS:2,24 GB/s:4,01 MB/s:4015,63
ClearByteFrameInt32Pointer: MS:0,62 GB/s:14,34 MB/s:14340,04
ClearByteFrameInt32PointerThreads(2): MS:0,50 GB/s:18,05 MB/s:18052,87
ClearByteFrameInt32PointerThreads(4): MS:0,49 GB/s:18,30 MB/s:18306,56
ClearByteFrameInt32PointerThreads(6): MS:0,46 GB/s:19,37 MB/s:19378,58
ClearByteFrameInt32PointerThreads(8): MS:0,48 GB/s:18,48 MB/s:18483,02
ClearByteFrameInt32PointerThreads(12): MS:0,49 GB/s:18,26 MB/s:18266,97
ClearByteFrameInt64Pointer: MS:0,46 GB/s:19,26 MB/s:19264,51
ClearByteFrameInt64PointerThreads(2): MS:0,51 GB/s:17,59 MB/s:17599,42
ClearByteFrameInt64PointerThreads(4): MS:0,50 GB/s:18,07 MB/s:18075,17
ClearByteFrameInt64PointerThreads(6): MS:0,47 GB/s:19,19 MB/s:19196,42
ClearByteFrameInt64PointerThreads(8): MS:0,49 GB/s:18,27 MB/s:18273,37
ClearByteFrameInt64PointerThreads(12): MS:0,48 GB/s:18,49 MB/s:18499,04
ClearInt32FrameAsSpanFill: MS:0,47 GB/s:18,98 MB/s:18987,70
ClearInt32FrameArrayFill: MS:0,48 GB/s:18,70 MB/s:18702,73
ClearFloatFrameAsSpanFill: MS:0,48 GB/s:18,60 MB/s:18609,78
ClearFloatFrameArrayFill: MS:0,47 GB/s:19,07 MB/s:19077,63
-- --
4500*4500
ClearByteFrameAsSpanFill: MS:3,49 GB/s:23,25 MB/s:23251,93
ClearByteFrameArrayFill: MS:3,45 GB/s:23,47 MB/s:23473,07
ClearByteFrameAsByteAVX: MS:7,24 GB/s:11,20 MB/s:11200,27
ClearByteFrameAsByteAVXUnRolled: MS:7,32 GB/s:11,08 MB/s:11081,40
ClearByteFrameAsByteAVXThreaded(2): MS:6,93 GB/s:11,70 MB/s:11702,41
ClearByteFrameAsByteAVXThreaded(4): MS:6,44 GB/s:12,58 MB/s:12588,30
ClearByteFrameAsByteAVXThreaded(6): MS:6,48 GB/s:12,53 MB/s:12536,29
ClearByteFrameAsByteAVXThreaded(8): MS:6,49 GB/s:12,47 MB/s:12479,53
ClearByteFrameAsByteAVXThreaded(12): MS:6,59 GB/s:12,28 MB/s:12286,35
ClearByteFrameMarshalCopy: MS:7,19 GB/s:11,27 MB/s:11270,57
ClearByteFrameMarshalCopy4: MS:7,25 GB/s:11,17 MB/s:11179,60
ClearByteFrameMarshalThreaded(2): MS:7,15 GB/s:11,33 MB/s:11337,41
ClearByteFrameMarshalThreaded(4): MS:7,38 GB/s:10,97 MB/s:10970,25
ClearByteFrameMarshalThreaded(6): MS:7,58 GB/s:10,68 MB/s:10682,29
ClearByteFrameMarshalThreaded(8): MS:7,64 GB/s:10,60 MB/s:10601,81
ClearByteFrameMarshalThreaded(12): MS:7,78 GB/s:10,40 MB/s:10400,19
ClearByteFrameNaive: MS:24,18 GB/s:3,34 MB/s:3348,89
ClearByteFrameBytePointer: MS:20,35 GB/s:3,97 MB/s:3979,91
ClearByteFrameInt32Pointer: MS:8,10 GB/s:10,00 MB/s:10009,30
ClearByteFrameInt32PointerThreads(2): MS:7,00 GB/s:11,57 MB/s:11575,67
ClearByteFrameInt32PointerThreads(4): MS:6,29 GB/s:12,86 MB/s:12868,42
ClearByteFrameInt32PointerThreads(6): MS:6,50 GB/s:12,48 MB/s:12485,34
ClearByteFrameInt32PointerThreads(8): MS:6,49 GB/s:12,49 MB/s:12496,20
ClearByteFrameInt32PointerThreads(12): MS:6,64 GB/s:12,19 MB/s:12194,29
ClearByteFrameInt64Pointer: MS:7,16 GB/s:11,33 MB/s:11331,90
ClearByteFrameInt64PointerThreads(2): MS:6,80 GB/s:11,92 MB/s:11926,33
ClearByteFrameInt64PointerThreads(4): MS:6,40 GB/s:12,67 MB/s:12670,50
ClearByteFrameInt64PointerThreads(6): MS:6,51 GB/s:12,49 MB/s:12490,41
ClearByteFrameInt64PointerThreads(8): MS:6,48 GB/s:12,51 MB/s:12511,71
ClearByteFrameInt64PointerThreads(12): MS:6,61 GB/s:12,25 MB/s:12256,23
ClearInt32FrameAsSpanFill: MS:7,41 GB/s:10,94 MB/s:10948,01
ClearInt32FrameArrayFill: MS:7,28 GB/s:11,13 MB/s:11133,96
ClearFloatFrameAsSpanFill: MS:7,23 GB/s:11,20 MB/s:11202,94
ClearFloatFrameArrayFill: MS:7,19 GB/s:11,26 MB/s:11265,38
For reference my computer is an amd ryzen 3600 and independent benchmarking software puts my max write speed at about 20 GB/s作为参考,我的电脑是 amd ryzen 3600,独立的基准测试软件将我的最大写入速度设置为大约 20 GB/s
My question is as follows:我的问题如下:
The only winner is ClearByteFrameAsSpanFill and ClearByteFrameArrayFill, but it is useless since i must reset the buffer with different 4 values per pixel.唯一的赢家是 ClearByteFrameAsSpanFill 和 ClearByteFrameArrayFill,但它没有用,因为我必须用每个像素 4 个不同的值重置缓冲区。 I thought i could use an int32 array instead of bytes, and just bitshift my rgba values into a single int and clear with that, but it fails on the large size.
我以为我可以使用 int32 数组而不是字节,只需将我的 rgba 值位移成单个 int 并用它清除,但它在大尺寸上失败。 ClearByteFrameAsByteAVX works small and medium, but fails on large.
ClearByteFrameAsByteAVX 在小型和中型上工作,但在大型上失败。
I need a singular method that performs well across all sizes, and i am morbidly curious why it behaves like this, you would think something as basic as this would be well optimized in the .net framework.我需要一种在所有规模上都表现良好的单一方法,我病态地好奇为什么它会这样,你会认为像这样基本的东西会在 .net 框架中得到很好的优化。
Any help would be appreciated.任何帮助,将不胜感激。 Please make a new console application and run the program, see if you get the same problem.
请制作一个新的控制台应用程序并运行该程序,看看您是否遇到同样的问题。
You can use AsSpan
approach in combination with MemoryMarshal.Cast
to use int
or long
as fill values:您可以将
AsSpan
方法与MemoryMarshal.Cast
结合使用,以使用int
或long
作为填充值:
static double ClearByteFrameAsIntSpanFill(int clearValue)
{
var asSpan = ByteFrame.AsSpan();
var cast = MemoryMarshal.Cast<byte, int>(asSpan);
cast.Fill(clearValue);
return (float)ByteFrame.Length / 1000000;
}
static double ClearByteFrameAsLongSpanFill(long clearValue)
{
var asSpan = ByteFrame.AsSpan();
var cast = MemoryMarshal.Cast<byte, long>(asSpan);
cast.Fill(clearValue);
return (float)ByteFrame.Length / 1000000;
}
I've run few tests using your code and it seems to have performance similar to ClearByteFrameAsSpanFill
/ ClearByteFrameArrayFill
on my machine.我使用您的代码进行了几次测试,它的性能似乎与我机器上的
ClearByteFrameAsSpanFill
/ ClearByteFrameArrayFill
相似。
Though in general I recommend using BenchmarkDotNet for performance testing/investigation.虽然一般来说我建议使用BenchmarkDotNet进行性能测试/调查。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.