Enhanced for loop performance worse than traditional indexed lookup?

Question

I just came across this seemingly innocuous comment , benchmarking ArrayList vs a raw String array. It's from a couple years ago, but the OP writes

I did notice that using for String s: stringsList was about 50% slower than using an old-style for-loop to access the list. Go figure...

Nobody commented on it in the original post, and the test seemed a little dubious (too short to be accurate), but I nearly fell out of my chair when I read it. I've never benchmarked an enhanced loop against a "traditional" one, but I'm currently working on a project that does hundreds of millions of iterations over ArrayList instances using enhanced loops so this is a concern to me.

I'm going to do some benchmarking and post my findings here, but this is obviously a big concern to me. I could find precious little info online about relative performance, except for a couple offhand mentions that enhanced loops for ArrayLists do run a lot slower under Android .

Has anybody experienced this? Does such a performance gap still exist? I'll post my findings here, but was very surprised to read it. I suspect that if this performance gap did exist, it has been fixed in more modern VM's, but I guess now I'll have to do some testing and confirm.

Update: I made some changes to my code, but was already suspecting what others here have already pointed out: sure the enhanced for loop is slower, but outside of very trivial tight loops, the cost should be a miniscule fraction of the cost of the logic of the loop. In my case, even though I'm iterating over very large lists of strings using enhanced loops, my logic inside the loop is complex enough that I couldn't even measure a difference after switching to index-based loops.

TL;DR: enhanced loops are indeed slower than a traditional index-based loop over an arraylist; but for most applications the difference should be negligible.

Answer 1

The problem you have is that using an Iterator will be slower than using a direct lookup. On my machine the difference is about 0.13 ns per iteration. Using an array instead saves about 0.15 ns per iteration. This should be trivial in 99% of situations.

public static void main(String... args) {
    int testLength = 100 * 1000 * 1000;
    String[] stringArray = new String[testLength];
    Arrays.fill(stringArray, "a");
    List<String> stringList = new ArrayList<String>(Arrays.asList(stringArray));
    {
        long start = System.nanoTime();
        long total = 0;
        for (String str : stringArray) {
            total += str.length();
        }
        System.out.printf("The for each Array loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
    }
    {
        long start = System.nanoTime();
        long total = 0;
        for (int i = 0, stringListSize = stringList.size(); i < stringListSize; i++) {
            String str = stringList.get(i);
            total += str.length();
        }
        System.out.printf("The for/get List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
    }
    {
        long start = System.nanoTime();
        long total = 0;
        for (String str : stringList) {
            total += str.length();
        }
        System.out.printf("The for each List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
    }
}

When run with one billion entries entries prints (using Java 6 update 26.)

The for each Array loop time was 0.76 ns total=1000000000
The for/get List loop time was 0.91 ns total=1000000000
The for each List loop time was 1.04 ns total=1000000000

When run with one billion entries entries prints (using OpenJDK 7.)

The for each Array loop time was 0.76 ns total=1000000000
The for/get List loop time was 0.91 ns total=1000000000
The for each List loop time was 1.04 ns total=1000000000

ie exactly the same. ;)

Answer 2

Every claim that X is slower than Y on a JVM which does not address all the issues presented in this article ant it's second part spreads fears and lies about the performance of a typical JVM. This applies to the comment referred to by the original question as well as to GravityBringer's answer. I am sorry to be so rude, but unless you use appropriate micro benchmarking technology your benchmarks produce really badly skewed random numbers.

Tell me if you're interested in more explanations. Although it is all in the articles I referred to.

Answer 3

GravityBringer's number doesn't seem right, because I know ArrayList.get() is as fast as raw array access after VM optimization.

I ran GravityBringer's test twice on my machine, -server mode

50574847
43872295
30494292
30787885
(2nd round)
33865894
32939945
33362063
33165376

The bottleneck in such tests is actually memory read/write. Judging from the numbers, the entire 2 arrays are in my L2 cache. If we decrease the size to fit L1 cache, or if we increase the size beyond L2 cache, we'll see 10X throughput difference.

The iterator of ArrayList uses a single int counter. Even if VM doesn't put it in a register (the loop body is too complex), at least it will be in the L1 cache, therefore r/w of are basically free.

The ultimate answer of course is to test your particular program in your particular environment.

Though it's not helpful to play agnostic whenever a benchmark question is raised.

Answer 4

The situation has gotten worse for ArrayLists. On my computer running Java 6.26, there is a fourfold difference. Interestingly (and perhaps quite logically), there is no difference for raw arrays. I ran the following test:

    int testSize = 5000000;

    ArrayList<Double> list = new ArrayList<Double>();
    Double[] arr = new Double[testSize];

    //set up the data - make sure data doesn't have patterns
    //or anything compiler could somehow optimize
    for (int i=0;i<testSize; i++)
    {
        double someNumber = Math.random();
        list.add(someNumber);
        arr[i] = someNumber;
    }

    //ArrayList foreach
    long time = System.nanoTime();
    double total1 = 0;
    for (Double k: list)
    {
        total1 += k;
    }
    System.out.println (System.nanoTime()-time);

    //ArrayList get() method
    time = System.nanoTime();
    double total2 = 0;
    for (int i=0;i<testSize;i++)
    {
        total2 += list.get(i);  
    }
    System.out.println (System.nanoTime()-time);        

    //array foreach
    time = System.nanoTime();
    double total3 = 0;
    for (Double k: arr)
    {
        total3 += k;
    }
    System.out.println (System.nanoTime()-time);

    //array indexing
    time = System.nanoTime();
    double total4 = 0;
    for (int i=0;i<testSize;i++)
    {
        total4 += arr[i];
    }
    System.out.println (System.nanoTime()-time);

    //would be strange if different values were produced,
    //but no, all these are the same, of course
    System.out.println (total1);
    System.out.println (total2);        
    System.out.println (total3);
    System.out.println (total4);

The arithmetic in the loops is to prevent the JIT compiler from possibly optimizing away some of the code. The effect of the arithmetic on performance is small, as the runtime is dominated by the ArrayList accesses.

The runtimes are (in nanoseconds):

ArrayList foreach: 248,351,782

ArrayList get(): 60,657,907

array foreach: 27,381,576

array direct indexing: 27,468,091

Enhanced for loop performance worse than traditional indexed lookup?

Question

4 answers

solution1
9 ACCPTED 2011-07-27 07:30:50

solution2
5 2011-07-27 06:11:01

solution3
3 2011-07-27 07:09:26

solution4
-1 2011-07-27 05:50:10

Enhanced for loop performance worse than traditional indexed lookup?

Question

4 answers

solution1 9 ACCPTED 2011-07-27 07:30:50

solution2 5 2011-07-27 06:11:01

solution3 3 2011-07-27 07:09:26

solution4 -1 2011-07-27 05:50:10

solution1
9 ACCPTED 2011-07-27 07:30:50

solution2
5 2011-07-27 06:11:01

solution3
3 2011-07-27 07:09:26

solution4
-1 2011-07-27 05:50:10