简体   繁体   English

从数组中删除,镜像(奇怪)行为

[英]Deleting from array, mirrored (strange) behavior

The title may seem a little odd, because I have no idea how to describe this in one sentence. 标题可能看起来有点奇怪,因为我不知道如何用一句话来描述这个。

For the course Algorithms we have to micro-optimize some stuff, one is finding out how deleting from an array works. 对于课程算法,我们必须微观优化一些东西,一个是找出如何从数组中删除。 The assignment is delete something from an array and re-align the contents so that there are no gaps, I think it is quite similar to how std::vector::erase works from c++. 赋值是从数组中删除某些内容并重新对齐内容以便没有间隙,我认为它与std :: vector :: erase在c ++中的工作方式非常相似。

Because I like the idea of understanding everything low-level, I went a little further and tried to bench my solutions. 因为我喜欢低级理解所有内容的想法,所以我更进一步尝试了解决方案。 This presented some weird results. 这提出了一些奇怪的结果。

At first, here is a little code that I used: 首先,这是我使用的一些代码:

class Test {

    Stopwatch sw;
    Obj[] objs;

    public Test() {
        this.sw = new Stopwatch();
        this.objs = new Obj[1000000];

        // Fill objs
        for (int i = 0; i < objs.Length; i++) {
            objs[i] = new Obj(i);
        }
    }

    public void test() {

        // Time deletion
        sw.Restart();
        deleteValue(400000, objs);
        sw.Stop();

        // Show timings
        Console.WriteLine(sw.Elapsed);
    }

    // Delete function
    // value is the to-search-for item in the list of objects
    private static void deleteValue(int value, Obj[] list) {

        for (int i = 0; i < list.Length; i++) {

            if (list[i].Value == value) {
                for (int j = i; j < list.Length - 1; j++) {
                    list[j] = list[j + 1];

                    //if (list[j + 1] == null) {
                    //    break;
                    //}
                }
                list[list.Length - 1] = null;
                break;
            }
        }
    }
}

I would just create this class and call the test() method. 我只是创建这个类并调用test()方法。 I did this in a loop for 25 times. 我这循环做了25次。

My findings: 我的发现:

  • The first round it takes a lot longer than the other 24, I think this is because of caching, but I am not sure. 第一轮它需要比其他24更长的时间,我认为这是因为缓存,但我不确定。
  • When I use a value that is in the start of the list, it has to move more items in memory than when I use a value at the end, though it still seems to take less time. 当我使用列表开头的值时,它必须在内存中移动的项目比在最后使用值时更多,但它似乎仍然需要更少的时间。
  • Benchtimes differ quite a bit. 基准时间差别很大。
  • When I enable the commented if, performance goes up (10-20%) even if the value I search for is almost at the end of the list (which means the if goes off a lot of times without actually being useful). 当我启用已注释的if时,即使我搜索的值几乎位于列表的末尾,性能也会上升(10-20%)(这意味着if会在没有实际用处的情况下发生很多次)。

I have no idea why these things happen, is there someone who can explain (some of) them? 我不知道为什么会发生这些事情,是否有人可以解释(部分)这些事情? And maybe if someone sees this who is a pro at this, where can I find more info to do this the most efficient way? 也许如果有人看到这是谁的专业人士,我在哪里可以找到更多信息以最有效的方式做到这一点?

Edit after testing: 测试后编辑:

I did some testing and found some interesting results. 我做了一些测试,发现了一些有趣的结果。 I run the test on an array with a size of a million items, filled with a million objects. 我在一个大小为一百万个项目的数组上运行测试,其中包含一百万个对象。 I run that 25 times and report the cumulative time in milliseconds. 我运行了25次,并以毫秒为单位报告累计时间。 I do that 10 times and take the average of that as a final value. 我做了10次并将其平均值作为最终值。

When I run the test with my function described just above here I get a score of: 362,1 当我使用上面描述的函数运行测试时,得到的分数为:362,1

When I run it with the answer of dbc I get a score of: 846,4 当我用dbc的答案运行它时得到的分数为:846,4

So mine was faster, but then I started to experiment with a half empty empty array and things started to get weird. 所以我的速度更快,但后来我开始尝试半空的空阵列,事情开始变得怪异。 To get rid of the inevitable nullPointerExceptions I added an extra check to the if (thinking it would ruin a bit more of the performance) like so: 为了摆脱不可避免的nullPointerExceptions,我添加了一个额外的检查if(认为它会破坏更多的性能),如下所示:

if (fromItem != null && fromItem.Value != value)
    list[to++] = fromItem;

This seemed to not only work, but improve performance dramatically! 这似乎不仅有效,而且可以大大提高性能! Now I get a score of: 247,9 现在我的得分为:247,9

The weird thing is, the scores seem to low to be true, but sometimes spike, this is the set I took the avg from: 94, 26, 966, 36, 632, 95, 47, 35, 109, 439 奇怪的是,分数看起来很低,但有时是秒杀,这是我从平均值中得到的集合:94,26,966,36,632,95,47,35,109,439

So the extra evaluation seems to improve my performance, despite of doing an extra check. 所以额外的评估似乎可以提高我的表现,尽管做了额外的检查。 How is this possible? 这怎么可能?

You are using Stopwatch to time your method. 您正在使用Stopwatch为您的方法Stopwatch This calculates the total clock time taken during your method call, which could include the time required for .Net to initially JIT your method , interruptions for garbage collection , or slowdowns caused by system loads from other processes. 这将计算方法调用期间所用的总时钟时间,其中可能包括.Net最初JIT方法所需的时间 ,垃圾收集中断或其他进程的系统负载导致的速度减慢。 Noise from these sources will likely dominate noise due to cache misses. 由于缓存未命中,来自这些源的噪声可能会主导噪声。

This answer gives some suggestions as to how you can minimize some of the noise from garbage collection or other processes. 这个答案提供了一些建议,说明如何最大限度地减少垃圾收集或其他过程中的一些噪音。 To eliminate JIT noise, you should call your method once without timing it -- or show the time taken by the first call in a separate column in your results table since it will be so different. 要消除JIT噪声,您应该在没有计时的情况下调用方法一次 - 或者在结果表的单独列中显示第一次调用所花费的时间,因为它会如此不同。 You might also consider using a proper profiler which will report exactly how much time your code used exclusive of "noise" from other threads or processes. 您还可以考虑使用适当的分析器 ,它将准确报告您的代码使用的时间,而不包括来自其他线程或进程的“噪音”。

Finally, I'll note that your algorithm to remove matching items from an array and shift everything else down uses a nested loop, which is not necessary and will access items in the array after the matching index twice. 最后,我会注意到你从数组中删除匹配项并向下移动其他所有内容的算法使用嵌套循环,这不是必需的,并且会在匹配索引两次后访问数组中的项。 The standard algorithm looks like this: 标准算法如下所示:

    public static void RemoveFromArray(this Obj[] array, int value)
    {
        int to = 0;
        for (int from = 0; from < array.Length; from++)
        {
            var fromItem = array[from];
            if (fromItem.Value != value)
                array[to++] = fromItem;
        }
        for (; to < array.Length; to++)
        {
            array[to] = default(Obj);
        }
    }

However, instead of using the standard algorithm you might experiment by using Array.RemoveAt() with your version, since (I believe) internally it does the removal in unmanaged code. 但是,您可以通过将Array.RemoveAt()与您的版本一起使用,而不是使用标准算法,因为(我相信)内部它会在非托管代码中执行删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM