Performance optimization of for-loop / switch-statement

Question

please help me to identify which of these following is more optimized code?

for(int i=0;i<count;i++)
{
    switch(way)
    {
        case 1:
            doWork1(i);
            break;
        case 2:
            doWork2(i);
            break;
        case 3:
            doWork3(i);
            break;
    }
}

OR

switch(way)
{
    case 1:
        for(int i=0;i<count;i++)
        {
            doWork1(i);
        }
        break;
    case 2:
        for(int i=0;i<count;i++)
        {
            doWork2(i);
        }
        break;
    case 3:
        for(int i=0;i<count;i++)
        {
            doWork3(i);
        }
        break;
}

In the first case, there happens to be an overhead of always checking the switch case condition in every iteration. In second case, the overhead is not there. I feel the second case is much better. If anyone has any other workaround, please help me out in suggesting it.

Answer 1

A switch on low, contiguous values is insanely fast - this type of jump has highly optimised handling. Frankly, what you ask will make no difference whatsoever in the vast majority of cases - anything in doWork2(i); is going to swamp this; heck, the virtual-call itself might swamp it.

If it really, really, really matters (and I struggle to think of a real scenario here), then: measure it. In any scenario where it is noticeable, then only way to measure it will be with your actual, exact code - you can't generalise pico-optimisations.

So:

it doesn't matter
measure
it doesn't matter

Answer 2

You could do something like:

Func(void, int> doWork;
switch(way) 
{ 
    case 1: 
        doWork = doWork1; 
        break; 
    case 2: 
        doWork = doWork2; 
        break; 
    case 3: 
        doWork = doWork3; 
        break; 
} 
for (int i=0;i<count;i++)  
{
     doWork(i);
}

(Written in here, code might not quite compile, just to give you the idea...)

Answer 3

I'd ask questions to myself for optimization

First of all, how big is count? Is it 1,2,10, 10000000000 ?
How powerful will the machine be that will be running the code?
Am I supposed to write less code ?
Is someone gonna read this code after I write it ? If so how professional is he ?
What do I lack of ? Time? Speed ? Something else ?
What is way ? Where do I get it from ? What are the probabilities of way being 1 or 2 or 3?

It is obvious that the first code snippet will go for the switch part until i reaches count but how big is count? If it is not a very big number it won't matter? If it is too big and you get very slow running time then it is useless. However, as I said if you want readibility and can guarantee that count is small why not use the first one? It is much more readible than the second one and there is less code which is something I like.

Second snippet, looks uggly but it should be preferred if count is a huge number.

Answer 4

Actually, it can be somewhat faster despite some of the comments here.

Let's actually test it:

using System;
using System.Diagnostics;

namespace Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            int count = 1000000000;

            Stopwatch sw = Stopwatch.StartNew();

            for (int way = 1; way <= 3; ++way)
                test1(count, way);

            var elapsed1 = sw.Elapsed;
            Console.WriteLine("test1() took " + elapsed1);

            sw.Restart();

            for (int way = 1; way <= 3; ++way)
                test2(count, way);

            var elapsed2 = sw.Elapsed;
            Console.WriteLine("test2() took " + elapsed2);

            Console.WriteLine("test2() was {0:f1} times as fast.", + ((double)elapsed1.Ticks)/elapsed2.Ticks);
        }

        static void test1(int count, int way)
        {
            for (int i = 0; i < count; ++i)
            {
                switch (way)
                {
                    case 1: doWork1(); break;
                    case 2: doWork2(); break;
                    case 3: doWork3(); break;
                }
            }
        }

        static void test2(int count, int way)
        {
            switch (way)
            {
                case 1:
                    for (int i = 0; i < count; ++i)
                        doWork1();
                    break;

                case 2:
                    for (int i = 0; i < count; ++i)
                        doWork2();
                    break;

                case 3:
                    for (int i = 0; i < count; ++i)
                        doWork3();
                    break;
            }
        }

        static void doWork1()
        {
        }

        static void doWork2()
        {
        }

        static void doWork3()
        {
        }
    }
}

Now this is quite unrealistic, since the doWork() methods don't do anything. However, it will give us a baseline timing.

The results I get for a RELEASE build on my Windows 7 x64 system are:

test1() took 00:00:03.8041522
test2() took 00:00:01.7916698
test2() was 2.1 times as fast.

So moving the loop into the switch statement makes it MORE THAN TWICE AS FAST.

Now let's make it a little bit more realistic by adding some code into doWork():

using System;
using System.Diagnostics;

namespace Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            int count = 1000000000;

            Stopwatch sw = Stopwatch.StartNew();

            for (int way = 1; way <= 3; ++way)
                test1(count, way);

            var elapsed1 = sw.Elapsed;
            Console.WriteLine("test1() took " + elapsed1);

            sw.Restart();

            for (int way = 1; way <= 3; ++way)
                test2(count, way);

            var elapsed2 = sw.Elapsed;
            Console.WriteLine("test2() took " + elapsed2);

            Console.WriteLine("test2() was {0:f1} times as fast.", + ((double)elapsed1.Ticks)/elapsed2.Ticks);
        }

        static int test1(int count, int way)
        {
            int total1 = 0, total2 = 0, total3 = 0;

            for (int i = 0; i < count; ++i)
            {
                switch (way)
                {
                    case 1: doWork1(i, ref total1); break;
                    case 2: doWork2(i, ref total2); break;
                    case 3: doWork3(i, ref total3); break;
                }
            }

            return total1 + total2 + total3;
        }

        static int test2(int count, int way)
        {
            int total1 = 0, total2 = 0, total3 = 0;

            switch (way)
            {
                case 1:
                    for (int i = 0; i < count; ++i)
                        doWork1(i, ref total1);
                    break;

                case 2:
                    for (int i = 0; i < count; ++i)
                        doWork2(i, ref total2);
                    break;

                case 3:
                    for (int i = 0; i < count; ++i)
                        doWork3(i, ref total3);
                    break;
            }

            return total1 + total2 + total3;
        }

        static void doWork1(int n, ref int total)
        {
            total += n;
        }

        static void doWork2(int n, ref int total)
        {
            total += n;
        }

        static void doWork3(int n, ref int total)
        {
            total += n;
        }
    }
}

Now I get these results:

test1() took 00:00:03.9153776
test2() took 00:00:05.3220507
test2() was 0.7 times as fast.

Now it's SLOWER to put the loop into the switch! This counterintuitive result is typical of these kinds of things, and demonstrates why you should ALWAY perform timing tests when you are trying to optimise code. (And optimising code like this is usually something you shouldn't even do unless you have good reasons to suspect that there is a bottleneck. You'd be better off spending your time cleaning up your code. ;))

I did some other tests, and for slightly simpler doWork() methods, the test2() method was quicker. It really greatly depends on what the JIT compiler can do with the optimisations.

NOTE: I think that the reason for the differences in speed for my second test code is because the JIT compiler can optimise out the 'ref' calls when inlining the calls to doWork() when they are not in a loop as in test1(); whereas for test2() it cannot (for some reason).

Answer 5

You should measure it to see whether it's worth to optimize or not(I'm very sure that it's not ). Personally i prefer the first for readability and conciseness(less code, less prone to errors, more " dry ").

Here's another approach which is even more concise:

for(int i = 0; i < count; i++)
{
    doAllWays(way, i); // let the method decide what to do next
}

All "ways" seem to be releated, otherwise they wouldn't appear in the same switch . Hence it makes sense to bundle them in one method first which does the switch .

Answer 6

The second method is more efficient; you have to complete the full for loop regardless. But in the first method, you're needlessly repeating the case statement count times.

Answer 7

assuming you have a performance issue here (as switch is really, really fast in most cases):

If you are bothered about your switch statement, i suggest applying refactoring here.

The switch can easily be replaced by a Strategy Pattern (since the switched value is not changed in the for loops, it is not necessary to switch at all).

The real optimization target are those for loops, but without context it is hard to tell what can be done about that.

Here is some more information on refactoring switches (eg to Strategy pattern) CodeProject Article on refactoring switch

Performance optimization of for-loop / switch-statement

Question

7 answers

solution1
5 ACCPTED 2012-06-26 07:41:55

solution2
2 2012-06-26 07:31:30

solution3
2 2012-06-26 07:49:24

solution4
2 2012-06-26 08:35:01

solution5
1 2012-06-26 07:47:33

solution6
0 2012-06-26 07:34:42

solution7
0 2012-06-26 07:53:01

Performance optimization of for-loop / switch-statement

Question

7 answers

solution1 5 ACCPTED 2012-06-26 07:41:55

solution2 2 2012-06-26 07:31:30

solution3 2 2012-06-26 07:49:24

solution4 2 2012-06-26 08:35:01

solution5 1 2012-06-26 07:47:33

solution6 0 2012-06-26 07:34:42

solution7 0 2012-06-26 07:53:01

solution1
5 ACCPTED 2012-06-26 07:41:55

solution2
2 2012-06-26 07:31:30

solution3
2 2012-06-26 07:49:24

solution4
2 2012-06-26 08:35:01

solution5
1 2012-06-26 07:47:33

solution6
0 2012-06-26 07:34:42

solution7
0 2012-06-26 07:53:01