简体   繁体   English

for-loop / switch-statement的性能优化

[英]Performance optimization of for-loop / switch-statement

please help me to identify which of these following is more optimized code? 请帮我确定以下哪些是更优化的代码?

for(int i=0;i<count;i++)
{
    switch(way)
    {
        case 1:
            doWork1(i);
            break;
        case 2:
            doWork2(i);
            break;
        case 3:
            doWork3(i);
            break;
    }
}

OR 要么

switch(way)
{
    case 1:
        for(int i=0;i<count;i++)
        {
            doWork1(i);
        }
        break;
    case 2:
        for(int i=0;i<count;i++)
        {
            doWork2(i);
        }
        break;
    case 3:
        for(int i=0;i<count;i++)
        {
            doWork3(i);
        }
        break;
}

In the first case, there happens to be an overhead of always checking the switch case condition in every iteration. 在第一种情况下, 总是在每次迭代中始终检查开关情况条件的开销。 In second case, the overhead is not there. 在第二种情况下,开销不存在。 I feel the second case is much better. 我觉得第二种情况要好得多。 If anyone has any other workaround, please help me out in suggesting it. 如果有人有任何其他解决方法,请帮我建议。

A switch on low, contiguous values is insanely fast - this type of jump has highly optimised handling. 在低连续值上switch非常快的 - 这种类型的跳转具有高度优化的处理。 Frankly, what you ask will make no difference whatsoever in the vast majority of cases - anything in doWork2(i); 坦率地说,你所要求的在绝大多数情况下都没有任何区别 - doWork2(i); 任何东西 doWork2(i); is going to swamp this; 会淹没这个; heck, the virtual-call itself might swamp it. 哎呀,虚拟电话本身可能会淹没它。

If it really, really, really matters (and I struggle to think of a real scenario here), then: measure it. 如果真的,真的,真的很重要(我很难想到这里真实的情况),那么:衡量它。 In any scenario where it is noticeable, then only way to measure it will be with your actual, exact code - you can't generalise pico-optimisations. 在任何明显的情况下, 只有测量它的方法才是 实际的,精确的代码 - 你不能概括微微优化。

So: 所以:

  1. it doesn't matter 没关系
  2. measure 测量
  3. it doesn't matter 没关系

You could do something like: 你可以这样做:

Func(void, int> doWork;
switch(way) 
{ 
    case 1: 
        doWork = doWork1; 
        break; 
    case 2: 
        doWork = doWork2; 
        break; 
    case 3: 
        doWork = doWork3; 
        break; 
} 
for (int i=0;i<count;i++)  
{
     doWork(i);
}

(Written in here, code might not quite compile, just to give you the idea...) (写在这里,代码可能不完全编译,只是为了给你这个想法......)

I'd ask questions to myself for optimization 我会问自己优化的问题

  1. First of all, how big is count? 首先,数量有多大? Is it 1,2,10, 10000000000 ? 是1,2,10,10000000000?
  2. How powerful will the machine be that will be running the code? 机器运行代码有多强大?
  3. Am I supposed to write less code ? 我应该写更少的代码吗?
  4. Is someone gonna read this code after I write it ? 有人在我写完之后会读这段代码吗? If so how professional is he ? 如果是这样他有多专业?
  5. What do I lack of ? 我缺少什么? Time? 时间? Speed ? 速度? Something else ? 别的什么?
  6. What is way ? 什么way Where do I get it from ? 我从哪里得到它? What are the probabilities of way being 1 or 2 or 3? way为1或2或3的概率是多少?

It is obvious that the first code snippet will go for the switch part until i reaches count but how big is count? 很明显,第一个代码片段将用于切换部分,直到i达到计数但计数有多大? If it is not a very big number it won't matter? 如果它不是一个非常大的数字那么无关紧要? If it is too big and you get very slow running time then it is useless. 如果它太大而你的运行时间非常慢那么它就没用了。 However, as I said if you want readibility and can guarantee that count is small why not use the first one? 但是,正如我所说,如果你想要可读性并且可以保证计数很小,为什么不使用第一个呢? It is much more readible than the second one and there is less code which is something I like. 它比第二个更容易读取,而且我喜欢的代码更少。

Second snippet, looks uggly but it should be preferred if count is a huge number. 第二个片段,看起来很流畅但如果count是一个巨大的数字应该是首选。

Actually, it can be somewhat faster despite some of the comments here. 实际上,尽管有一些评论,它可能会更快一些。

Let's actually test it: 我们实际测试它:

using System;
using System.Diagnostics;

namespace Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            int count = 1000000000;

            Stopwatch sw = Stopwatch.StartNew();

            for (int way = 1; way <= 3; ++way)
                test1(count, way);

            var elapsed1 = sw.Elapsed;
            Console.WriteLine("test1() took " + elapsed1);

            sw.Restart();

            for (int way = 1; way <= 3; ++way)
                test2(count, way);

            var elapsed2 = sw.Elapsed;
            Console.WriteLine("test2() took " + elapsed2);

            Console.WriteLine("test2() was {0:f1} times as fast.", + ((double)elapsed1.Ticks)/elapsed2.Ticks);
        }

        static void test1(int count, int way)
        {
            for (int i = 0; i < count; ++i)
            {
                switch (way)
                {
                    case 1: doWork1(); break;
                    case 2: doWork2(); break;
                    case 3: doWork3(); break;
                }
            }
        }

        static void test2(int count, int way)
        {
            switch (way)
            {
                case 1:
                    for (int i = 0; i < count; ++i)
                        doWork1();
                    break;

                case 2:
                    for (int i = 0; i < count; ++i)
                        doWork2();
                    break;

                case 3:
                    for (int i = 0; i < count; ++i)
                        doWork3();
                    break;
            }
        }

        static void doWork1()
        {
        }

        static void doWork2()
        {
        }

        static void doWork3()
        {
        }
    }
}

Now this is quite unrealistic, since the doWork() methods don't do anything. 现在这是非常不现实的,因为doWork()方法不做任何事情。 However, it will give us a baseline timing. 但是,它会给我们一个基线时间。

The results I get for a RELEASE build on my Windows 7 x64 system are: 我在Windows 7 x64系统上构建RELEASE的结果是:

test1() took 00:00:03.8041522
test2() took 00:00:01.7916698
test2() was 2.1 times as fast.

So moving the loop into the switch statement makes it MORE THAN TWICE AS FAST. 因此,将循环移动到switch语句中会使其快速超过两次。

Now let's make it a little bit more realistic by adding some code into doWork(): 现在让我们通过在doWork()中添加一些代码来使它变得更加真实:

using System;
using System.Diagnostics;

namespace Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            int count = 1000000000;

            Stopwatch sw = Stopwatch.StartNew();

            for (int way = 1; way <= 3; ++way)
                test1(count, way);

            var elapsed1 = sw.Elapsed;
            Console.WriteLine("test1() took " + elapsed1);

            sw.Restart();

            for (int way = 1; way <= 3; ++way)
                test2(count, way);

            var elapsed2 = sw.Elapsed;
            Console.WriteLine("test2() took " + elapsed2);

            Console.WriteLine("test2() was {0:f1} times as fast.", + ((double)elapsed1.Ticks)/elapsed2.Ticks);
        }

        static int test1(int count, int way)
        {
            int total1 = 0, total2 = 0, total3 = 0;

            for (int i = 0; i < count; ++i)
            {
                switch (way)
                {
                    case 1: doWork1(i, ref total1); break;
                    case 2: doWork2(i, ref total2); break;
                    case 3: doWork3(i, ref total3); break;
                }
            }

            return total1 + total2 + total3;
        }

        static int test2(int count, int way)
        {
            int total1 = 0, total2 = 0, total3 = 0;

            switch (way)
            {
                case 1:
                    for (int i = 0; i < count; ++i)
                        doWork1(i, ref total1);
                    break;

                case 2:
                    for (int i = 0; i < count; ++i)
                        doWork2(i, ref total2);
                    break;

                case 3:
                    for (int i = 0; i < count; ++i)
                        doWork3(i, ref total3);
                    break;
            }

            return total1 + total2 + total3;
        }

        static void doWork1(int n, ref int total)
        {
            total += n;
        }

        static void doWork2(int n, ref int total)
        {
            total += n;
        }

        static void doWork3(int n, ref int total)
        {
            total += n;
        }
    }
}

Now I get these results: 现在我得到了这些结果:

test1() took 00:00:03.9153776
test2() took 00:00:05.3220507
test2() was 0.7 times as fast.

Now it's SLOWER to put the loop into the switch! 现在将循环放入开关是SLOWER! This counterintuitive result is typical of these kinds of things, and demonstrates why you should ALWAY perform timing tests when you are trying to optimise code. 这种违反直觉的结果是典型的这类事情,并说明了在尝试优化代码时应始终执行时序测试的原因。 (And optimising code like this is usually something you shouldn't even do unless you have good reasons to suspect that there is a bottleneck. You'd be better off spending your time cleaning up your code. ;)) (并且优化这样的代码通常是你甚至不应该做的事情,除非你有充分的理由怀疑存在瓶颈。你最好花时间清理你的代码。))

I did some other tests, and for slightly simpler doWork() methods, the test2() method was quicker. 我做了一些其他测试,对于稍微简单的doWork()方法,test2()方法更快。 It really greatly depends on what the JIT compiler can do with the optimisations. 它实际上很大程度上取决于JIT编译器可以对优化做些什么。

NOTE: I think that the reason for the differences in speed for my second test code is because the JIT compiler can optimise out the 'ref' calls when inlining the calls to doWork() when they are not in a loop as in test1(); 注意:我认为我的第二个测试代码的速度差异的原因是因为JIT编译器在内联调用doWork()时可以优化掉'ref'调用,当它们不在循环中时,就像在test1()中一样; whereas for test2() it cannot (for some reason). 而对于test2(),它不能(由于某种原因)。

You should measure it to see whether it's worth to optimize or not(I'm very sure that it's not ). 您应该测量它以确定是否值得优化(我非常确定它不是 )。 Personally i prefer the first for readability and conciseness(less code, less prone to errors, more " dry "). 就个人而言,我更喜欢第一个用于可读性和简洁性(代码少,不易出错,更“ ”)。

Here's another approach which is even more concise: 这是另一种更简洁的方法:

for(int i = 0; i < count; i++)
{
    doAllWays(way, i); // let the method decide what to do next
}

All "ways" seem to be releated, otherwise they wouldn't appear in the same switch . 所有“方式”似乎都得到了解决,否则它们就不会出现在同一个switch Hence it makes sense to bundle them in one method first which does the switch . 因此,首先将它们捆绑在一个方法中进行switch是有意义的。

The second method is more efficient; 第二种方法更有效; you have to complete the full for loop regardless. 无论如何,你必须完成完整的for循环。 But in the first method, you're needlessly repeating the case statement count times. 但是在第一种方法中,你不必要地重复case语句计数次数。

assuming you have a performance issue here (as switch is really, really fast in most cases): 假设你在这里遇到性能问题(因为在大多数情况下,交换机真的非常快):

If you are bothered about your switch statement, i suggest applying refactoring here. 如果您对switch语句感到困扰,我建议您在此处应用重构。

The switch can easily be replaced by a Strategy Pattern (since the switched value is not changed in the for loops, it is not necessary to switch at all). 可以很容易地用策略模式替换开关(因为在for循环中没有改变切换值,所以根本不需要切换)。

The real optimization target are those for loops, but without context it is hard to tell what can be done about that. 真正的优化目标是那些循环,但没有上下文,很难说明可以做些什么。

Here is some more information on refactoring switches (eg to Strategy pattern) CodeProject Article on refactoring switch 这里有一些关于重构开关的更多信息(例如,关于策略模式) 关于重构开关的CodeProject文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM