简体   繁体   中英

Whats the performance cost on iterating on a list multiple times

I have a List I need to iterate through and perform work on along the way. My problem is that, for my program, its difficult to do all the work needed on the list every loop due to concurrency. My solution was to iterate over the loop as many times as necessary and doing part of the work in every iteration. This example should illustrate what I mean:

List<String> list = new ArrayList<>();
    list.add("A");
    list.add("B");
    list.add("C");
    list.add("D");

    // Method A
    for (String item : list) {
        doWorkThing1(item);
        doWorkThing2(item);
    }

    // Method B
    for (String item : list) {
        doWorkThing1(item);
    }
    for (String item : list) {
        doWorkThing2(item);
    }

Method B is what I am curious about. Is there a notable cost to iterating over a loop multiple times? Since I assume most of the cost in performance will go to the "work" methods, I was wondering if it's fair to say the difference between Method A and Method B would be negligible?

First of all you have to define what do you mean when you talk about performance.

If you are talking about time complexity of your algorithm we can say that an algorithm that iterates over a list of size n has a time complexity of O(n). So if you make c iterations of your list (where c is a costant number), the time complexity remains the same. In other words, costants are not important, so if you iterate 3 time over a list of size n you'll have a time complexity of

3 * O(n) ~= O(n) .

Now, it's really important to define what your methods do. If they perform some operation that require costant time (eg print the value of the item) the complexity remains the same -> O(n).

You can find other informatoin about time extimation here .

There is another way to measure the perfomance of an algorithm, the space complexity . I didn't talk about that because there is only a simple data structure defined in your code but you can find some useful information about this topic here .

The performance difference would probably be measurable, but in your example it would be less than a microsecond. My gut feeling is in the region of 100 nanoseconds ... once the code has been JIT compiled.

It is possible, but unlikely that that the performance difference of that size is significant . For the difference to be significant, you need one or more of the following to be true:

  • The methods are called many, many times.
  • The application has hard real-time requirements; eg a call to one of the methods has to complete in a very short time window.

And even if one of those criteria is met, if the time taken to do the work is microseconds, milliseconds or larger, then the time to do the work will dominate the time "wasted" on an second iteration.


Here's my advice.

  1. Before you start thinking about optimizing, have a clear understanding of the performance requirements. If there are no performance requirements stated or implied, then don't waste your time on optimizing. (There is no moral imperative to make your code as fast as possible.)

  2. It is more important to get the correct (enough) answer than to get the answer fast. So don't spend time optimizing until your code is written and tested.

  3. Build yourself a benchmark (using realistic input data, etc) that you can use to judge whether the code is running fast enough, and make before-and-after comparisons of your candidate optimizations. (Beware of the standard traps when benchmarking Java code.)

  4. Use profiling to decide the parts of your code where it is worthwhile to optimize. Look for the hotspots; ie the methods / sections where most of the time is spent. (Optimizing code that is not a hotspot is unlikely to make a significant difference to overall application performance.)

  5. When you reach your goal, or when you run out of hotspots ... stop .

For this particular use case, you can run micro-benchmark to compare different approaches of code which implements the same logic and choose the best one to use.

public void methodA() {
    List<String> list = new ArrayList<>();
    list.add("A");
    list.add("B");
    list.add("C");
    list.add("D");

    for (String item : list) {
        doWorkThing1(item);
        doWorkThing2(item);
    }

}

public void methodB() {

    List<String> list = new ArrayList<>();
    list.add("A");
    list.add("B");
    list.add("C");
    list.add("D");

    for (String item : list) {
        doWorkThing1(item);
    }
    for (String item : list) {
        doWorkThing2(item);
    }

}

private void doWorkThing2(String item) {
    int j = 0;
    for (int i = 0; i < 10000; i++) {
        j = i + j;
    }
}

private void doWorkThing1(String item) {
    int j = 0;
    for (int i = 0; i < 10000; i++) {
        j = i + j;
    }
}

When I run the code using jmh (ie Java Microbenchmarking Harness) tool. Following result being produced;

Benchmark               Mode  Cnt         Score         Error  Units
IterationCost.methodA  thrpt   20  56183172.456 ± 1388825.737  ops/s
IterationCost.methodB  thrpt   20  49693471.457 ±  777747.554  ops/s
IterationCost.methodA   avgt   20        ≈ 10⁻⁸                 s/op
IterationCost.methodB   avgt   20        ≈ 10⁻⁸                 s/op

Be aware of other factors that can impact the results.

There are too many variables to tell you which to use for best performance, since that can vary depending on what's going on. There is no one way that will work best in all situations.

Sometimes splitting up the loop will give better performance (maybe the processor's instruction cache won't need refilling several times each iteration). Sometimes combining into one loop will give better performance (maybe the processor's data cache won't need refilling n times as much as with one loop).

Profile with real world data one way. Profile with real world data another way. Use the best performing way.

In general the obvious overhead is only due to the extra incrementing of the additional counters/fetching do to the extra for loops. This is not significant if it is not significant for one for loop but is obviously N times the cost of one for loop.

There also might be issues with caching and such but you shouldn't have much a problem.

a for loop is effectively (pseudo)

loop:
   x = list[k++];
   work1(x);
   work2(x);
goto loop;

so 2 loops are

loop:
   x = list[k++];
   work1(x);
goto loop;

loop:
   x = list[k++];
   work2(x);
goto loop;

Ultimately, to know EXACTLY what happens you have to profile.. but even then you don't know exactly. If you are trying to milk the last 0.00001% of cpu cycles then you can worry about it, otherwise don't.

If you if your work functions are not "heavy"(do very little) then there are other ways to optimize things like unrolling the loop, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM