[英]Performance of bitwise & on longs vs ints on 64 bit
It seems that when performing an &
operation between two long
s it takes the same amount of time as the equivalent operation inside 4 32bit int
s. 似乎在两个
long
s之间执行&
操作时,它需要与4 32bit int
s内的等效操作相同的时间。
For example 例如
long1 & long2
Takes as long as 只需要
int1 & int2
int3 & int4
This is running on a 64bit OS and targeting 64bit .net. 这是在64位操作系统上运行,目标是64位.net。
In theory, this should be twice as fast. 从理论上讲,这应该快两倍。 Has anyone encountered this previously?
以前有没有遇到过这个?
EDIT 编辑
As a simplification, imagine I have two lots of 64 bits of data. 作为一种简化,假设我有两批64位数据。 I take those 64 bits and put them into a
long
, and perform a bitwise &
on those two. 我取64位并将它们放入一个
long
位&
然后对这两个位执行按位。
I also take those two sets of data, and put the 64 bits into two 32 bit int
values and perform two &
s. 我也取这两组数据,并将64位放入两个32位
int
值并执行两个 &
s。 I expect to see the long
&
operation running faster than the int
&
operation. 我希望看到
long
&
operation的运行速度比int
&
operation快。
I couldn't reproduce the problem. 我无法重现这个问题。
My test was as follows (int version shown): 我的测试如下(显示的是int版本):
// deliberately made hard to optimise without whole program optimisation
public static int[] data = new int[1000000]; // long[] when testing long
// I happened to have a winforms app open, feel free to make this a console app..
private void button1_Click(object sender, EventArgs e)
{
long best = long.MaxValue;
for (int j = 0; j < 1000; j++)
{
Stopwatch timer = Stopwatch.StartNew();
int a1 = ~0, b1 = 0x55555555, c1 = 0x12345678; // varies: see below
int a2 = ~0, b2 = 0x55555555, c2 = 0x12345678;
int[] d = data; // long[] when testing long
for (int i = 0; i < d.Length; i++)
{
int v = d[i]; // long when testing long, see below
a1 &= v; a2 &= v;
b1 &= v; b2 &= v;
c1 &= v; c2 &= v;
}
// don't average times: we want the result with minimal context switching
best = Math.Min(best, timer.ElapsedTicks);
button1.Text = best.ToString() + ":" + (a1 + a2 + b1 + b2 + c1 + c2).ToString("X8");
}
}
For testing longs a1
and a2
etc are merged, giving: 为了测试longs
a1
和a2
等合并,给出:
long a = ~0, b = 0x5555555555555555, c = 0x1234567812345678;
Running the two programs on my laptop (i7 Q720) as a release build outside of VS (.NET 4.5) I got the following times: 在我的笔记本电脑(i7 Q720)上运行这两个程序作为VS(.NET 4.5) 以外的版本构建我得到以下时间:
int: 2238, long: 1924 int: 2238, long: 1924
Now considering there's a huge amount of loop overhead, and that the long
version is working with twice as much data (8mb vs 4mb), it still comes out clearly ahead. 现在考虑到有大量的循环开销,并且
long
版本使用两倍的数据(8mb对4mb),它仍然明显领先。 So I have no reason to believe that C# is not making full use of the processor's 64 bit bitops. 所以我没有理由相信C#没有充分利用处理器的64位bitops。
But we really shouldn't be benching it in the first place. 但我们真的不应该把它放在第一位。 If there's a concern, simply check the jited code (Debug -> Windows -> Disassembly).
如果有问题,只需检查jited代码(Debug - > Windows - > Disassembly)。 Ensure the compiler's using the instructions you expect it to use, and move on.
确保编译器使用您期望它使用的指令,然后继续。
Attempting to measure the performance of those individual instructions on your processor (and this could well be specific to your processor model) in anything other than assembler is a very bad idea - and from within a jit compiled language like C#, beyond futile. 尝试在处理器上测量那些单独指令的性能(这可能是处理器模型特有的),除了汇编程序之外的其他任何东西都是一个非常糟糕的主意 - 而且从像C#这样的jit编译语言中,这是徒劳的。 But there's no need to anyway, as it's all in Intel's optimisation handbook should you need to know.
但是无论如何都没有必要,因为如果您需要了解英特尔优化手册中的全部内容。
To this end, here's the disassembly of the a &=
for the long
version of the program on x64 (release, but inside of debugger - unsure if this affects the assembly, but it certainly affects the performance): 为此,这里是对x64程序的
long
版本的a &=
的反汇编(发布,但在调试器内 - 不确定这是否会影响程序集,但它肯定会影响性能):
00000111 mov rcx,qword ptr [rsp+60h] ; a &= v
00000116 mov rax,qword ptr [rsp+38h]
0000011b and rax,rcx
0000011e mov qword ptr [rsp+38h],rax
As you can see there's a single 64 bit and operation as expected, along with three 64 bit moves. 正如您所看到的,有一个64位和预期的操作,以及三个64位移动。 So far so good, and exactly half the number of ops of the
int
version: 到目前为止这么好,并且正好是
int
版本操作数量的一半:
00000122 mov ecx,dword ptr [rsp+5Ch] ; a1 &= v
00000126 mov eax,dword ptr [rsp+38h]
0000012a and eax,ecx
0000012c mov dword ptr [rsp+38h],eax
00000130 mov ecx,dword ptr [rsp+5Ch] ; a2 &= v
00000134 mov eax,dword ptr [rsp+44h]
00000138 and eax,ecx
0000013a mov dword ptr [rsp+44h],eax
I can only conclude that the problem you're seeing is specific to something about your test suite, build options, processor... or quite possibly, that the &
isn't the point of contention you believe it to be. 我只能说,你看到的问题是具体到一些有关你的测试套件,编译选项,处理器......或者很可能,那
&
不争的你相信它是点。 HTH. HTH。
I can't reproduce your timings. 我无法重现你的时间。 The following code generates two arrays: one of 1,000,000 longs, and one with 2,000,000 ints.
以下代码生成两个数组:一个1,000,000个long,一个具有2,000,000个int。 Then it loops through the arrays, applying the
&
operator to successive values. 然后它循环遍历数组,将
&
运算符应用于连续的值。 It keeps a running sum and outputs it, just to make sure that the compiler doesn't decide to remove the loop entirely because it isn't doing anything. 它保持运行总和并输出它,只是为了确保编译器不会决定完全删除循环,因为它没有做任何事情。
Over dozens of successive runs, the long
loop is at least twice as fast as the int
loop. 经过几十次连续运行,
long
循环至少是int
循环的两倍。 This is running on a Core 2 Quad with Windows 8 Developer Preview and Visual Studio 11 Developer Preview. 这是在带有Windows 8开发人员预览版和Visual Studio 11开发人员预览版的Core 2 Quad上运行的。 Program is compiled with "Any CPU", and run in 64 bit mode.
程序使用“Any CPU”编译,并以64位模式运行。 All testing done using Ctrl+F5 so that the debugger isn't involved.
使用Ctrl + F5完成所有测试,以便不涉及调试器。
int numLongs = 1000000;
int numInts = 2*numLongs;
var longs = new long[numLongs];
var ints = new int[numInts];
Random rnd = new Random();
// generate values
for (int i = 0; i < numLongs; ++i)
{
int i1 = rnd.Next();
int i2 = rnd.Next();
ints[2 * i] = i1;
ints[2 * i + 1] = i2;
long l = i1;
l = (l << 32) | (uint)i2;
longs[i] = l;
}
// time operations.
int isum = 0;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < numInts; i += 2)
{
isum += ints[i] & ints[i + 1];
}
sw.Stop();
Console.WriteLine("Ints: {0} ms. isum = {1}", sw.ElapsedMilliseconds, isum);
long lsum = 0;
int halfLongs = numLongs / 2;
sw.Restart();
for (int i = 0; i < halfLongs; i += 2)
{
lsum += longs[i] & longs[i + 1];
}
sw.Stop();
Console.WriteLine("Longs: {0} ms. lsum = {1}", sw.ElapsedMilliseconds, lsum);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.