简体   繁体   English

64 位整数上的 C++ 与 C# 按位运算 - 性能

[英]C++ vs C# bitwise operations on 64-bit ints - performance

I have 2D field of bits stored in an array of 5 unsigned longs.我将 2D 位字段存储在 5 个无符号长整型数组中。 I am going for the best performance.我要争取最好的表现。 I am working in C# but I tried to set a benchmark by implementing my class in C++.我在 C# 中工作,但我试图通过在 C++ 中实现我的类来设置基准。

The problem here is that the C# implementation takes about 10 seconds to finish where the C++ takes about 1 second making it 10 times faster .这里的问题是 C# 实现需要大约 10 秒才能完成,而 C++ 需要大约 1 秒,使其速度提高 10 倍 C++ is x64 build in VS2015. C++ 是 VS2015 中的 x64 构建。 C# is in x64 VS2015 .NET 4.6. C# 在 x64 VS2015 .NET 4.6 中。 Both in Release of course.当然,两者都在 Release 中。

EDIT: After optimizing the C# code a little it still takes 7 to 8 seconds vs C++ 1.3 seconds.编辑:稍微优化 C# 代码后,与 C++ 1.3 秒相比,它仍然需要 7 到 8 秒。

Note: C++ in x86 takes about 6 seconds to finish.注意: x86 中的 C++ 大约需要 6 秒才能完成。 I am running the code on 64-bit machine.我在 64 位机器上运行代码。

Question: What makes the C++ THAT much faster?问题:是什么让 C++ 更快? And is there a way to optimize the C# code to be at least similarly fast?有没有办法将 C# 代码优化为至少同样快? (Maybe some unsafe magic?) (也许是一些不安全的魔法?)

What puzzles me is that we are talking just about iterating through arrays and bitwise operations.让我感到困惑的是,我们只是在谈论遍历数组和按位运算。 Shouldn't it be JITed to pretty much the same thing as C++?它不应该与 C++ 几乎相同吗?

Example code: There are two simple functions in the implementation.示例代码: 实现中有两个简单的函数。 Left() and Right() shifting the whole filed by 1 bit to the left resp. Left() 和 Right() 将整个字段分别向左移动 1 位。 right with appropriate bit carrying between the longs.正确的在多头之间带有适当的位。

C++ C++

#include <iostream>
#include <chrono>
using namespace std;
using namespace std::chrono;

class BitField
{
private:
    unsigned long long LEFTMOST_BIT = 0x8000000000000000;
    unsigned long long RIGHTMOST_BIT = 1;

public:
    unsigned long long Cells_l[5];
    BitField()
    {
        for (size_t i = 0; i < 5; i++)
        {
            Cells_l[i] = rand(); // Random initialization
        }
    }
    void Left()
    {
        unsigned long long carry = 0;
        unsigned long long nextCarry = 0;
        for (int i = 0; i < 5; i++)
        {
            nextCarry = (Cells_l[i] & LEFTMOST_BIT) >> 63;
            Cells_l[i] = Cells_l[i] << 1 | carry;
            carry = nextCarry;
        }
    }
    void Right()
    {
        unsigned long long carry = 0;
        unsigned long long nextCarry = 0;
        for (int i = 4; i >= 0; i--)
        {
            nextCarry = (Cells_l[i] & RIGHTMOST_BIT) << 63;
            Cells_l[i] = Cells_l[i] >> 1 | carry;
            carry = nextCarry;
        }
    }
};

int main()
{
    BitField bf;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    for (int i = 0; i < 100000000; i++)
    {
        bf.Left();
        bf.Left();
        bf.Left();
        bf.Right();
        bf.Right();
        bf.Left();
        bf.Right();
        bf.Right();
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();

    auto duration = duration_cast<milliseconds>(t2 - t1).count();

    cout << "Time: " << duration << endl << endl;
    // Print to avoid compiler optimizations
    for (size_t i = 0; i < 5; i++)
    {
        cout << bf.Cells_l[i] << endl;
    }

    return 0;
}

C# C#

using System;
using System.Diagnostics;

namespace TestCS
{
    class BitField
    {
        const ulong LEFTMOST_BIT = 0x8000000000000000;
        const ulong RIGHTMOST_BIT = 1;

        static Random rnd = new Random();

        ulong[] Cells;

        public BitField()
        {
            Cells = new ulong[5];
            for (int i = 0; i < 5; i++)
            {
                Cells[i] = (ulong)rnd.Next(); // Random initialization
            }
        }

        public void Left()
        {
            ulong carry = 0;
            ulong nextCarry = 0;
            for (int i = 0; i < 5; i++)
            {
                nextCarry = (Cells[i] & LEFTMOST_BIT) >> 63;
                Cells[i] = Cells[i] << 1 | carry;
                carry = nextCarry;
            }
        }
        public void Right()
        {
            ulong carry = 0;
            ulong nextCarry = 0;
            for (int i = 4; i >= 0; i--)
            {
                nextCarry = (Cells[i] & RIGHTMOST_BIT) << 63;
                Cells[i] = Cells[i] >> 1 | carry;
                carry = nextCarry;
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            BitField bf = new BitField();
            Stopwatch sw = new Stopwatch();

            // Call to remove the compilation time from measurements
            bf.Left();
            bf.Right();

            sw.Start();
            for (int i = 0; i < 100000000; i++)
            {
                bf.Left();
                bf.Left();
                bf.Left();
                bf.Right();
                bf.Right();
                bf.Left();
                bf.Right();
                bf.Right();
            }
            sw.Stop();

            Console.WriteLine($"Done in: {sw.Elapsed.TotalMilliseconds.ToString()}ms");
        }
    }
}

EDIT: Fixed "nextCarry" typos in example code.编辑:修复了示例代码中的“nextCarry”拼写错误。

Part of the difference may be because of the differences in code between the two versions - you don't assign to nextCarry in the C++ Left nor in the C# Right , but those could be typos in the example.部分差异可能是因为两个版本之间的代码差异 - 您没有在 C++ Left和 C# Right分配给nextCarry ,但这些可能是示例中的拼写错误。

You'd want to look at the disassembly of both to see the difference, but primarily it is due to the C++ compiler having more time to spend optimizing the code.您可能想查看两者的反汇编以了解差异,但这主要是由于 C++ 编译器有更多时间花在优化代码上。 In this case it unrolls the loops, inlines all the function calls (including the constructor), and shoves all of the stuff in Cells_l into registers.在这种情况下,它展开循环,内联所有函数调用(包括构造函数),并将Cells_l所有内容Cells_l送到寄存器中。 So there's one big loop using registers and no accesses to memory.所以有一个使用寄存器的大循环,并且不能访问内存。

I haven't looked at the C# compiled output but I doubt it does anything close to that.我没有看过 C# 编译的输出,但我怀疑它是否有任何接近的结果。

Also, as mentioned in a comment, replace all the Cells.Length calls in your C# code to 5 (just like you have in the C++ code).此外,如评论中所述,将 C# 代码中的所有Cells.Length调用替换为 5(就像在 C++ 代码中一样)。

I have got enough information from comments and a deleted answer from @AntoninLejsek that I can answer this myself.我从评论和@AntoninLejsek 删除的答案中获得了足够的信息,我可以自己回答这个问题。

TL;DR C++ compiler does much better job optimizing and C# managed array access costs a lot when done in loop. TL;DR C++ 编译器在优化方面做得更好,并且在循环中完成 C# 托管数组访问成本很高。 However unsafe code and fixed access is not enough to match C++.然而,不安全的代码和固定访问不足以匹配 C++。

It seems we need to optimize the C# code manually to get performance comparable to C++.看来我们需要手动优化 C# 代码才能获得与 C++ 相当的性能。

  1. Unroll loops展开循环
  2. Use unsafe code for fixed array access使用不安全代码进行固定数组访问
  3. Don't access the array repeatedly - rather store the item into local variable.不要重复访问数组 - 而是将项目存储到局部变量中。

Following C# code runs as fast as C++ code (about 100 ms faster in fact).以下 C# 代码的运行速度与 C++ 代码一样快(实际上快了大约 100 毫秒)。 Compiled on .NET 4.6 VS 2015 Release x64.在 .NET 4.6 VS 2015 Release x64 上编译。

unsafe struct BitField
{
    static Random rnd = new Random();
    public fixed ulong Cells[5];
    public BitField(int nothing)
    {
        fixed (ulong* p = Cells)
        {
            for (int i = 0; i < 5; i++)
            {
                p[i] = (ulong)rnd.Next(); // Just some random number
            }
        }
    }
public void StuffUnrolledNonManaged()
{
        ulong u0;
        ulong u1;
        ulong u2;
        ulong u3;
        ulong u4;
        fixed (ulong *p = Cells)
        {
            u0 = p[0];
            u1 = p[1];
            u2 = p[2];
            u3 = p[3];
            u4 = p[4];
        }
        ulong carry = 0;
        ulong nextCarry = 0;

        for (int i = 0; i < 100000000; i++)
        {

            //left
            carry = 0;
            nextCarry = u0 >> 63;
            u0 = u0 << 1 | carry;
            carry = nextCarry;
            nextCarry = u1 >> 63;
            u1 = u1 << 1 | carry;
            carry = nextCarry;
            nextCarry = u2 >> 63;
            u2 = u2 << 1 | carry;
            carry = nextCarry;
            nextCarry = u3 >> 63;
            u3 = u3 << 1 | carry;
            carry = nextCarry;
            u4 = u4 << 1 | carry;

            //left
            carry = 0;
            nextCarry = u0 >> 63;
            u0 = u0 << 1 | carry;
            carry = nextCarry;
            nextCarry = u1 >> 63;
            u1 = u1 << 1 | carry;
            carry = nextCarry;
            nextCarry = u2 >> 63;
            u2 = u2 << 1 | carry;
            carry = nextCarry;
            nextCarry = u3 >> 63;
            u3 = u3 << 1 | carry;
            carry = nextCarry;
            u4 = u4 << 1 | carry;

            //left
            carry = 0;
            nextCarry = u0 >> 63;
            u0 = u0 << 1 | carry;
            carry = nextCarry;
            nextCarry = u1 >> 63;
            u1 = u1 << 1 | carry;
            carry = nextCarry;
            nextCarry = u2 >> 63;
            u2 = u2 << 1 | carry;
            carry = nextCarry;
            nextCarry = u3 >> 63;
            u3 = u3 << 1 | carry;
            carry = nextCarry;
            u4 = u4 << 1 | carry;

            //right
            carry = 0;
            nextCarry = u4 << 63;
            u4 = u4 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u3 << 63;
            u3 = u3 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u2 << 63;
            u2 = u2 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u1 << 63;
            u1 = u1 >> 1 | carry;
            carry = nextCarry;
            u0 = u0 >> 1 | carry;

            //right
            carry = 0;
            nextCarry = u4 << 63;
            u4 = u4 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u3 << 63;
            u3 = u3 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u2 << 63;
            u2 = u2 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u1 << 63;
            u1 = u1 >> 1 | carry;
            carry = nextCarry;
            u0 = u0 >> 1 | carry;

            //left
            carry = 0;
            nextCarry = u0 >> 63;
            u0 = u0 << 1 | carry;
            carry = nextCarry;
            nextCarry = u1 >> 63;
            u1 = u1 << 1 | carry;
            carry = nextCarry;
            nextCarry = u2 >> 63;
            u2 = u2 << 1 | carry;
            carry = nextCarry;
            nextCarry = u3 >> 63;
            u3 = u3 << 1 | carry;
            carry = nextCarry;
            u4 = u4 << 1 | carry;

            //right
            carry = 0;
            nextCarry = u4 << 63;
            u4 = u4 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u3 << 63;
            u3 = u3 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u2 << 63;
            u2 = u2 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u1 << 63;
            u1 = u1 >> 1 | carry;
            carry = nextCarry;
            u0 = u0 >> 1 | carry;

            //right
            carry = 0;
            nextCarry = u4 << 63;
            u4 = u4 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u3 << 63;
            u3 = u3 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u2 << 63;
            u2 = u2 >> 1 | carry;
            carry = nextCarry;
            nextCarry = u1 << 63;
            u1 = u1 >> 1 | carry;
            carry = nextCarry;
            u0 = u0 >> 1 | carry;

        }

        fixed (ulong* p = Cells)
        {
            p[0] = u0;
            p[1] = u1;
            p[2] = u2;
            p[3] = u3;
            p[4] = u4;
        }
    }

Testing code测试代码

static void Main(string[] args)
        {
            BitField bf = new BitField(0);
            Stopwatch sw = new Stopwatch();

            // Call to remove the compilation time from measurements
            bf.StuffUnrolledNonManaged();

            sw.Start();
            bf.StuffUnrolledNonManaged();
            sw.Stop();

            Console.WriteLine($"Non managed access unrolled in: {sw.Elapsed.TotalMilliseconds.ToString()}ms");
        }

This code finishes in about 1.1 seconds .此代码在大约1.1 秒内完成。

Note: Only fixed array access is not enough to match the C++ performance.注意:仅固定数组访问不足以匹配 C++ 性能。 If we don't use the local variables - every instance of u0 is replaced by p[0] etc.. The time is about 3.6 seconds .如果我们不使用局部变量 - u0 的每个实例都被 p[0] 等替换。时间约为3.6 秒

If we use only fixed access with the code from question (calling Left() and Right() functions in loop).如果我们只对问题中的代码使用固定访问(在循环中调用 Left() 和 Right() 函数)。 The time is about 5.8 seconds .时间约为5.8 秒

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM