简体   繁体   English

对插入排序的复杂性和基准感到困惑

[英]Confused about complexity of insertion sort, and benchmark

Using c# benchmark I get the following results, of mean time of my insertion sort, that code seems to be linear, what's going on?使用 c# 基准测试我得到以下结果,我的插入排序的平均时间,该代码似乎是线性的,这是怎么回事? my code is wrong, this is what should be expected, or I'm misunderstanding the big O notation.我的代码是错误的,这是应该预期的,或者我误解了大 O 表示法。

n = 10000-309,971.64 ns 
n = 1-32.84 ns      
n = 10-362.14 ns        
n = 100-3,527.50 ns     
n = 1000-30,895.43 ns   
public static class RandomUtils
{
    public static long[] generateArray(int count)
    {
        Random random = new Random();
        long[] values = new long[count];

        for (long i = 0; i < count; ++i)
            values[i] = random.Next();

        return values;
    }
}
public static class sort_insertion{
public static void insertsort(long[] data, long n)
        {
            long i, j;
            for (i = 1; i < n; i++)
            {
                long item = data[i];
                long ins = 0;
                for (j = i - 1; j >= 0 && ins != 1; )
                {
                    if (item < data[j])
                    {
                        data[j + 1] = data[j];
                        j--;
                        data[j + 1] = item;
                    }
                    else ins = 1;
                }
            }
        }
}

public class BE{
public long A { get; set; }   
public long[] arra;
public BE()
{
    arra = RandomUtils.generateArray(100000); // 1,10,100,...
}
[Benchmark]
public void Benchmarka() => sort_insertion.insertsort(arra, arra.Length);
}


Doesn't [Benchmark] call that function repeatedly? [Benchmark]不会重复调用 function 吗?

But you only initialized the array once in the constructor, so all the later calls are timing the sorted case, which is O(N) for Insertion Sort .但是你只在构造函数中初始化了一次数组,所以后面的所有调用都是对排序的情况进行计时,对于Insertion Sort是 O(N) (One of its good properties is being fast for sorted or almost-sorted input; item < data[j] is always false, so no copying happens, each outer-loop iteration does the minimum amount of work.) (它的优点之一是对已排序或​​几乎已排序的输入快速; item < data[j]始终为假,因此不会发生复制,每次外循环迭代完成的工作量最少。)
Benchmark.net averages over many runs of the function, so the majority of the time comes from the linear case. Benchmark.net 对 function 的多次运行进行平均,因此大部分时间来自线性情况。

You could change your InsertionSort to read from one array, and insert into another array , so you can sort the same input repeatedly without destroying it, thus without including any time to memcpy it to an array that you sort in-place.您可以将 InsertionSort 更改为从一个数组中读取,然后插入到另一个数组中,这样您就可以重复对相同的输入进行排序而不会破坏它,因此无需任何时间将其 memcpy 到您就地排序的数组中。 You still do data[j+1] = data[j] to make room, then eventually do data[j] = item , it's just a matter of where you get the item to insert.你仍然做data[j+1] = data[j]来腾出空间,然后最终做data[j] = item ,这只是你在哪里插入项目的问题。 From a separate array instead of from just beyond the sorted region of this array.从一个单独的数组而不是从这个数组的排序区域之外。

Sorting the same pattern repeatedly will let your CPU's branch predictor "learn" the pattern for small enough n , making it blazing fast.重复对相同的模式进行排序将使您的 CPU 的分支预测器“学习”足够小的n模式,使其速度极快。 eg my i7-6700k Skylake could learn the pattern of branching in a hand-written asm Bubble Sort for something like 12 or 14 integer elements (branching for each swap), with perf stat showing branch-miss percentage below 1% IIRC.例如,我的 i7-6700k Skylake 可以在手写 asm 冒泡排序中学习分支模式,例如 12 或 14 个 integer 元素(每个交换的分支),性能perf stat显示分支未命中百分比低于 1% IIRC。 (In that case I copied fresh data to sort with a few 32-byte AVX vmovdqa copy instructions, which is easy in hand-written asm if you're playing around that way for fun and curiosity.) (在那种情况下,我复制了新数据以使用一些 32 字节的 AVX vmovdqa复制指令进行排序,如果您出于乐趣和好奇心而玩这种方式,这在手写 asm 中很容易。)

But even with good or bad branch prediction, it'll still have to do O(N^2) work, and beyond a dozen or so elements, the CPU won't be able to correctly predict all the inner-loop exits anyway.但即使有好的或坏的分支预测,它仍然需要做 O(N^2) 的工作,超过十几个元素,CPU 无论如何都无法正确预测所有的内循环出口。 But you might be able to observe a fall-off in the constant factor.但是您可能会观察到常数因子的下降。

To get a consistent amount of work without randomness of the input adding noise, you could sort an array of decreasing elements, ie sorted in reverse order.要获得一致的工作量,而不会使输入的随机性增加噪声,您可以对递减元素数组进行排序,即按相反顺序排序。 (That might make the branch-prediction patterns simpler.) (这可能会使分支预测模式更简单。)

Note that sizes of 10 and 100 are very far from infinity, so constant factors and practical considerations are a major factor.请注意, 10100的大小远非无穷大,因此常数因素和实际考虑是主要因素。


You shouldn't be getting cache-size effects, even n=1000 only takes 4 KiB of RAM, plenty smaller than L1d cache size.你不应该得到缓存大小的影响,即使 n=1000 也只需要 4 KiB 的 RAM,远小于 L1d 缓存大小。 (Or 8K if C# long is 64-bit?). (如果 C# long为 64 位,则为 8K?)。 Branch prediction might do better in big arrays, since you'd tend to have longer runs of moving branching the same way to stay in the loop.分支预测在大 arrays 中可能会做得更好,因为您倾向于以相同的方式移动分支以保持循环更长的运行时间。 But it seems unlikely that along would come so close to balancing out the quadratic increase in amount of work.但似乎不太可能如此接近平衡工作量的二次增长。

The 1 element case is is presumably just timing overhead; 1 个元素的情况可能只是时间开销; 32 nanoseconds is about 120 clock cycles on a 4 GHz CPU.在 4 GHz CPU 上,32 纳秒大约是 120 个时钟周期。 It doesn't take that long to do evaluate the i < n branch condition for i=1 and skip the rest of the function.评估 i = 1 的i < n分支条件并跳过 function 的 rest 不需要那么长时间。


You only need to do data[j + 1] = item;你只需要做data[j + 1] = item; the last time ;最后一次 it can be in the else block.它可以在else块中。 (Which should just be able to break out of the loop). (应该能够break循环)。

But that should just be increasing the constant factor of your O(N^2) Insertion Sort.但这应该只是增加 O(N^2) 插入排序的常数因子。 I think this implementation looks correct, although using a bool ins seems clunky to me instead of a break .我认为这个实现看起来是正确的,尽管使用bool ins对我来说似乎很笨重,而不是break Or make the item < data[j] part of the inner loop condition, so you just assign to data[j] after the loop.或者使item < data[j]成为内部循环条件的一部分,因此您只需在循环后分配给data[j] (That potentially copies an item to itself, but that's fine.) (这可能会将项目复制到自身,但这很好。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM