在C＃中拆分数组的最快（便携）方式

Question

I'm writing a fully managed Mercurial library (to be used in a fully managed Mercurial Server for Windows , coming soon), and one of the most severe performance problems I'm coming across is, strangely enough, splitting an array in parts. 我正在编写一个完全托管的Mercurial库（将用于完全托管的Mercurial Server for Windows ，即将推出），而我遇到的最严重的性能问题之一就是奇怪地将数组分成几部分。

The idea is as follows: there's a byte array with size ranging from several hundred bytes to up to a megabyte and all I need to do with it is to split it in parts delimited by, in my specific case, \\n characters. 这个想法如下：有一个字节数组，大小范围从几百字节到一兆字节，我需要做的就是将它拆分成部分，在我的特定情况下由\\n字符分隔。

Now what dotTrace shows me is that my "optimized" version of Split (the code is correct, here's the naive version I began with) takes up 11 seconds for 2,300 calls (there's an obvious performance hit introduced by the dotTrace itself, but everything's up to scale). 现在dotTrace向我展示的是，我的“优化”版本的Split （代码是正确的，这是我开始使用的天真版本）需要11秒才能完成2,300次调用（dotTrace本身引入了明显的性能影响，但一切都在上升按比例）。

Here are the numbers: 这是数字：

unsafe version: 11 297 ms for 2 312 calls unsafe版本： 11 297毫秒，用于2 312通话
managed ("naive") version: 20 001 ms for 2 312 calls 托管（“天真”）版本： 2 312呼叫20 001毫秒

So here goes: what will be the fastest (preferably portable, meaning supporting both x86 and x64) way to split an array in C#. 所以这里是：最快的（最好是可移植的，意味着支持x86和x64）在C＃中拆分数组的方法。

Answer 1

I believe the problem is, that you are doing a lot of complex operations in loop. 我相信问题是，你在循环中做了很多复杂的操作。 This code removes all the operations except single addition and comparison inside a loop. 此代码删除除循环内的单个添加和比较之外的所有操作。 Other complex stuff happens only when split is detected or at end of an array. 其他复杂的东西只有在检测到分割或阵列结束时才会发生。

Also, it is hard to tell what kind of data you run your tests with, so this is only guesswork. 此外，很难说你运行测试的数据类型，所以这只是猜测。

public static unsafe Segment[] Split2(byte[] _src, byte value)
{
    var _ln = _src.Length;

    if (_ln == 0) return new Segment[] { };

    fixed (byte* src = _src)
    {
        var segments = new LinkedList<Segment>(); // Segment[c];

        byte* last = src;
        byte* end = src + _ln - 1;
        byte lastValue = *end;
        *end = value; // value-termination

        var cur = src;
        while (true)
        {
            if (*cur == value)
            {
                int begin = (int) (last - src);
                int length = (int) (cur - last + 1);
                segments.AddLast(new Segment(_src, begin, length));

                last = cur + 1;

                if (cur == end)
                {
                    if (lastValue != value)
                    {
                        *end = lastValue;
                    }
                    break;
                }
            }
            cur++;
        }

        return segments.ToArray();
    }
}

Edit : Fixed code, so it returns correct results. 编辑：固定代码，因此它返回正确的结果。

Answer 2

For Split, handling ulong on 32-bit machine is really slow, so definitely reduce to uint. 对于Split，在32位机器上处理ulong非常慢，所以绝对要减少到uint。 If you really want ulong, implement two versions, one for 32-bit, one for 64-bit. 如果你真的想要ulong，请实现两个版本，一个用于32位，一个用于64位。

You should also measure whether handling byte at a time is faster. 您还应该测量一次处理字节是否更快。

Need to profile the cost of memory allocation. 需要分析内存分配的成本。 If it's bigger enough, try to reuse memory across multiple calls. 如果它足够大，请尝试跨多个调用重用内存。

Other: 其他：

ToString: it's faster to use "(" + Offset.ToString() + ", " + Length.ToString() + ")"; ToString：使用“（”+“Offset.ToString（）+”，“+ Length.ToString（）+”）“;

GetHashCode: try fixed(byte * b = & buffer[offset]) GetHashCode：尝试修复（byte * b =＆buffer [offset]）

This version should be really fast, if used multiple times. 如果多次使用，此版本应该非常快。 Key point: no new memory allocation after the internal array has expanded to the right size, minimal data copy. 关键点：内部阵列扩展到正确的大小后，没有新的内存分配，最小的数据复制。

class ArraySplitter
{
    private byte[] m_data;
    private int    m_count;
    private int[]  m_stops;

    private void AddRange(int start, int stop)
    {
        // Skip empty range
        if (start > stop)
        {
            return;
        }

        // Grow array if needed
        if ((m_stops == null) || (m_stops.Length < (m_count + 2)))
        {
            int[] old = m_stops;

            m_stops = new int[m_count * 3 / 2 + 4];

            if (old != null)
            {
                old.CopyTo(m_stops, 0);
            }
        }

        m_stops[m_count++] = start;
        m_stops[m_count++] = stop;
    }

    public int Split(byte[] data, byte sep)
    {
        m_data  = data;
        m_count = 0;      // reuse m_stops

        int last = 0;

        for (int i = 0; i < data.Length; i ++)
        {
            if (data[i] == sep)
            {
                AddRange(last, i - 1);
                last = i + 1;
            }
        }

        AddRange(last, data.Length - 1);

        return m_count / 2;
    }

    public ArraySegment<byte> this[int index]
    {
        get
        {
            index *= 2;
            int start = m_stops[index];

            return new ArraySegment<byte>(m_data, start, m_stops[index + 1] - start + 1);
        }
    }
}

Test program: 测试程序：

    static void Main(string[] args)
    {
        int count = 1000 * 1000;

        byte[] data = new byte[count];

        for (int i = 0; i < count; i++)
        {
            data[i] = (byte) i;
        }

        Stopwatch watch = new Stopwatch();

        for (int r = 0; r < 10; r++)
        {
            watch.Reset();
            watch.Start();

            int len = 0;

            foreach (var seg in data.MySplit(13))
            {
                len += seg.Count;
            }

            watch.Stop();

            Console.WriteLine("MySplit      : {0} {1,8:N3} ms", len, watch.Elapsed.TotalMilliseconds);

            watch.Reset();
            watch.Start();

            ArraySplitter splitter = new ArraySplitter();

            int parts = splitter.Split(data, 13);

            len = 0;

            for (int i = 0; i < parts; i++)
            {
                len += splitter[i].Count;
            }

            watch.Stop();
            Console.WriteLine("ArraySplitter: {0} {1,8:N3} ms", len, watch.Elapsed.TotalMilliseconds);
        }
    }

Result: 结果：

MySplit      : 996093    9.514 ms
ArraySplitter: 996093    4.754 ms
MySplit      : 996093    7.760 ms
ArraySplitter: 996093    2.710 ms
MySplit      : 996093    8.391 ms
ArraySplitter: 996093    3.510 ms
MySplit      : 996093    9.677 ms
ArraySplitter: 996093    3.468 ms
MySplit      : 996093    9.685 ms
ArraySplitter: 996093    3.370 ms
MySplit      : 996093    9.700 ms
ArraySplitter: 996093    3.425 ms
MySplit      : 996093    9.669 ms
ArraySplitter: 996093    3.519 ms
MySplit      : 996093    9.844 ms
ArraySplitter: 996093    3.416 ms
MySplit      : 996093    9.721 ms
ArraySplitter: 996093    3.685 ms
MySplit      : 996093    9.703 ms
ArraySplitter: 996093    3.470 ms

Answer 3

Anton, 安东，

I don't know if you are still interested in optimizing this since this thread is fairly old, however I saw that your code was pretty much the same in your online repository so I thought I would give it a shot. 我不知道你是否仍然有兴趣优化这个，因为这个线程已经很老了，但我看到你的代码在你的在线存储库里几乎一样，所以我想我会试一试。 I looked over your HgSharp code on bitbucket.org while evaluating your HgLab application. 在评估您的HgLab应用程序时，我在bitbucket.org上查看了您的HgSharp代码。 I rewrote the function using native constructs which simplified it greatly. 我使用本机构造重写了该函数，这极大地简化了它。 My tests resulted in better than half the time than your original routine. 我的测试结果比原来的例程好一半。 I tested it by loading a source file that was several megabytes and compared the timings to performance of the same operation using your original routine. 我通过加载几兆字节的源文件来测试它，并使用原始例程将时序与同一操作的性能进行比较。

In addition to rewriting the basic logic, I decided to use the native ArraySegment<> built into the framework instead of your custom implementation. 除了重写基本逻辑之外，我还决定使用框架中内置的原生ArraySegment<>而不是自定义实现。 The only significant difference is that ArraySegment<> exposes a Count property instead of a Length property. 唯一显着的区别是ArraySegment<>公开Count属性而不是Length属性。 The code below is does not require the unsafe keyword because I am not using pointers, however there does seem to be a slight performance improvement if it is changed to do so. 下面的代码不需要unsafe关键字，因为我没有使用指针，但如果更改它，似乎确实有轻微的性能提升。

    public static ArraySegment<byte>[] SplitEx(this byte[] source, byte value) {
        var _previousIndex = -1;
        var _segments = new List<ArraySegment<byte>>();
        var _length = source.Length;

        if (_length > 0) {
            int _index;

            for (_index = 0; _index < _length; _index++) {
                var _value = source[_index];
                if (_value == value) {
                    _segments.Add(new ArraySegment<byte>(source, _previousIndex + 1, _index - _previousIndex));
                    _previousIndex = _index;
                }
            }

            if (--_index != _previousIndex) {
                _segments.Add(new ArraySegment<byte>(source, _previousIndex + 1, _index - _previousIndex));
            }
        }

        return _segments.ToArray();
    }

在C＃中拆分数组的最快（便携）方式

问题描述

3 个解决方案

解决方案1
4 2012-08-26 20:39:43

解决方案2
3 2012-08-26 19:58:23

解决方案3
2 2012-12-18 01:01:04

在C＃中拆分数组的最快（便携）方式

问题描述

3 个解决方案

解决方案1 4 2012-08-26 20:39:43

解决方案2 3 2012-08-26 19:58:23

解决方案3 2 2012-12-18 01:01:04

解决方案1
4 2012-08-26 20:39:43

解决方案2
3 2012-08-26 19:58:23

解决方案3
2 2012-12-18 01:01:04