简体   繁体   English

两种方式切割阵列的最快方法

[英]Fastest way to chop array in two pieces

I have an array, say: 我有一个数组,说:

var arr1 = new [] { 1, 2, 3, 4, 5, 6 };

Now, when my array-size exceeds 5, I want to resize the current array to 3, and create a new array that contains the upper 3 values, so after this action: 现在,当我的数组大小超过5时,我想将当前数组的大小调整为3,并创建一个包含上3个值的新数组,因此在执行此操作之后:

arr1 = new [] { 1, 2, 3 };
newArr = new [] { 4, 5, 6 };

What's the fastest way to do this? 最快的方法是什么? I guess I'll have to look into the unmanaged corner, but no clue. 我想我将不得不调查非管理角落,但没有任何线索。


Some more info: 更多信息:

  • The arrays have to be able to size up without large performance hits 阵列必须能够在没有大的性能命中的情况下进行调整
  • The arrays will only contain Int32's 这些数组只包含Int32
  • Purpose of the array is to group the numbers in my source array without having to sort the whole list 数组的目的是在我的源数组中对数字进行分组,而不必对整个列表进行排序

In short: I want to split the following input array: 简而言之:我想拆分以下输入数组:

int[] arr = new int[] { 1, 3, 4, 29, 31, 33, 35, 36, 37 };

into

arr1 =  1, 3, 4
arr2 =  29, 31, 33, 35, 36, 37

but because the ideal speed is reached with an array size of 3, arr2 should be split into 2 evenly sized arrays. 但由于数组大小为3时达到理想速度,因此应将arr2拆分为2个大小均匀的阵列。

Note 注意

I know that an array's implementation in memory is quite naive (well, at least it is in C, where you can manipulate the count of items in the array so the array resizes). 我知道数组在内存中的实现非常幼稚(好吧,至少在C中,你可以操作数组中的项目数,以便数组调整大小)。 Also that there is a memory move function somewhere in the Win32 API. 此外,Win32 API中的某处还有一个memory move功能。 So I guess this would be the fastest: 所以我想这会是最快的:

  1. Change arr1 so it only contains 3 items 更改arr1所以它只包含3个项目
  2. Create new array arr2 with size 3 创建大小为3的新数组arr2
  3. Memmove the bytes that aren't in arr1 anymore into arr2 将不在arr1的字节重新导入到arr2

I'm not sure there's anything better than creating the empty arrays, and then using Array.Copy . 我不确定有什么比创建空数组更好,然后使用Array.Copy I'd at least hope that's optimized internally :) 我至少希望在内部进行优化:)

int[] firstChunk = new int[3];
int[] secondChunk = new int[3];
Array.Copy(arr1, 0, firstChunk, 0, 3);
Array.Copy(arr1, 3, secondChunk, 0, 3);

To be honest, for very small arrays the overhead of the method call may be greater than just explicitly assigning the elements - but I assume that in reality you'll be using slightly bigger ones :) 老实说,对于非常小的数组,方法调用的开销可能大于仅仅显式分配元素 - 但我认为实际上你将使用稍大一些的:)

You might also consider not actually splitting the array, but instead using ArraySegment to have separate "chunks" of the array. 您可能还会考虑实际拆分数组,而是使用ArraySegment来分离数组的“块”。 Or perhaps use List<T> to start with... it's hard to know without a bit more context. 或者也许使用List<T>开始......如果没有更多的上下文,很难知道。

If speed is really critical, then unmanaged code using pointers may well be the fastest approach - but I would definitely check whether you really need to go there before venturing into unsafe code. 如果速度真的很关键,那么使用指针的非托管代码可能是最快的方法 - 但我肯定会检查你是否真的需要去冒险进入不安全的代码。

Are you looking for something like this? 你在找这样的东西吗?

static unsafe void DoIt(int* ptr)
{
    Console.WriteLine(ptr[0]);
    Console.WriteLine(ptr[1]);
    Console.WriteLine(ptr[2]);
}

static unsafe void Main()
{
    var bytes = new byte[1024];
    new Random().NextBytes(bytes);

    fixed (byte* p = bytes)
    {
        for (int i = 0; i < bytes.Length; i += sizeof(int))
        {
            DoIt((int*)(p + i));
        }
    }

    Console.ReadKey();
}

This avoids creating new arrays (which cannot be resized, not even with unsafe code!) entirely and just passes a pointer into the array to some method which reads the first three integers. 这样就可以完全避免创建新的数组( 无法调整大小,甚至不能使用不安全的代码!),只需将指针传递给数组即可读取前三个整数的方法。

If your array will always contain 6 items how about: 如果你的数组总是包含6个项目怎么样:

var newarr1 = new []{oldarr[0], oldarr[1],oldarr[2]};
var newarr2 = new []{oldarr[3], oldarr[4],oldarr[5]};

Reading from memory is fast. 从内存中读取很快。

Since arrays are not dynamically resized in C#, this means your first array must have a minimum length of 5 or maximum length of 6, depending on your implementation. 由于数组不是在C#中动态调整大小,这意味着您的第一个数组必须具有最小长度5或最大长度6,具体取决于您的实现。 Then, you're going to have to dynamically create new statically sized arrays of 3 each time you need to split. 然后,每次需要拆分时,您将不得不动态创建3个新的静态大小的数组。 Only after each split will your array items be in their natural order unless you make each new array a length of 5 or 6 as well and only add to the most recent. 只有在每次拆分后,您的数组项才会按自然顺序排列,除非您将每个新数组的长度设置为5或6,并且仅添加到最新数组。 This approach means that each new array will have 2-3 extra pointers as well. 这种方法意味着每个新阵列也会有2-3个额外的指针。

Unless you have a known number of items to go into your array BEFORE compiling the application, you're also going to have to have some form of holder for your dynamically created arrays, meaning you're going to have to have an array of arrays (a jagged array). 除非你在编译应用程序之前有一个已知数量的项目进入你的数组,否则你还必须为动态创建的数组提供某种形式的持有者,这意味着你将不得不拥有一个数组数组(一个锯齿状的阵列)。 Since your jagged array is also statically sized, you'll need to be able to dynamically recreate and resize it as each new dynamically created array is instantiated. 由于您的锯齿状数组也是静态大小的,因此您需要能够在实例化每个新动态创建的数组时动态重新创建和调整大小。

I'd say copying the items into the new array is the least of your worries here. 我想说将这些项目复制到新阵列中是您最不担心的问题。 You're looking at some pretty big performance hits as well as the array size(s) grow. 你正在寻找一些非常大的性能命中以及数组大小的增长。


UPDATE: So, if this were absolutely required of me... 更新:所以,如果这是我绝对需要...

public class MyArrayClass
{
    private int[][] _master = new int[10][];
    private int[] _current = new int[3];
    private int _currentCount, _masterCount;

    public void Add(int number)
    {
        _current[_currentCount] = number;
        _currentCount += 1;
        if (_currentCount == _current.Length)
        {
            Array.Copy(_current,0,_master[_masterCount],0,3);
            _currentCount = 0;
            _current = new int[3];
            _masterCount += 1;
            if (_masterCount == _master.Length)
            {
                int[][] newMaster = new int[_master.Length + 10][];
                Array.Copy(_master, 0, newMaster, 0, _master.Length);
                _master = newMaster;
            }
        }
    }

    public int[][] GetMyArray()
    {
        return _master;
    }

    public int[] GetMinorArray(int index)
    {
        return _master[index];
    }

    public int GetItem(int MasterIndex, int MinorIndex)
    {
        return _master[MasterIndex][MinorIndex];
    }
}

Note: This probably isn't perfect code, it's a horrible way to implement things, and I would NEVER do this in production code. 注意:这可能不是完美的代码,这是一种实现方式的可怕方式,我绝不会在生产代码中这样做。

The obligatory LINQ solution: 强制性LINQ解决方案:

if(arr1.Length > 5)
{
   var newArr = arr1.Skip(arr1.Length / 2).ToArray();
   arr1 = arr1.Take(arr1.Length / 2).ToArray();
}

LINQ is faster than you might think; LINQ比你想象的要快; this will basically be limited by the Framework's ability to spin through an IEnumerable (which on an array is pretty darn fast). 这将基本上受到框架在IEnumerable中旋转的能力的限制(在阵列上非常快)。 This should execute in roughly linear time, and can accept any initial size of arr1. 这应该在大致线性的时间内执行,并且可以接受任何初始大小的arr1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM