What can I do to speed up this code?

Question

I'm trying to learn Java, Scala, & Clojure.

I'm working through the Project Euler problems in the three languages. Listed below is the code for problem #5 ( http://projecteuler.net/problem=5 ) as well as the run time (in seconds) so far on the first five problems. It is striking to me that the Java and Clojure versions are so much slower than the Scala one for problem #5. They are running on the same machine, same jvm, and the results are consistent over several trials. What can I do to speed the two up (especially the Clojure version)? Why is the Scala version so much faster?

Running Times (in seconds)

|---------|--------|--------|----------|
| problem | Java   | Scala  | Clojure  |
|=========|========|========|==========|
|    1    |  .0010 |  .1570 |   .0116  |
|    2    |  .0120 |  .0030 |   .0003  |
|    3    |  .0530 |  .0200 |   .1511  |
|    4    |  .2120 |  .2600 |   .8387  |
|    5    | 3.9680 |  .3020 | 33.8574  |

Java Version of Problem #5

public class Problem005 {

  private static ArrayList<Integer> divisors;

  private static void initializeDivisors(int ceiling) {
    divisors = new ArrayList<Integer>();
    for (Integer i = 1; i <= ceiling; i++)
      divisors.add(i);
  }

  private static boolean isDivisibleByAll(int n) {
    for (int divisor : divisors)
      if (n % divisor != 0)
        return false;
    return true;
  }

  public static int findSmallestMultiple (int ceiling) {
    initializeDivisors(ceiling);
    int number = 1;
    while (!isDivisibleByAll(number))
      number++;
    return number;
  }

}

Scala Version of Problem #5

object Problem005 {
  private def isDivisibleByAll(n: Int, top: Int): Boolean = 
    (1 to top).forall(n % _ == 0)

  def findSmallestMultiple(ceiling: Int): Int = {
    def iter(n: Int): Int = if (isDivisibleByAll(n, ceiling)) n else iter(n+1)
    iter(1)
  }

}

Clojure Verson of Problem #5

(defn smallest-multiple-of-1-to-n
  [n]
  (loop [divisors (range 2 (inc n))
        i n]
    (if (every? #(= 0 (mod i %)) divisors)
      i
      (recur divisors (inc i)))))

EDIT

It was suggested that I compile the various answers into my own answer. However, I want to give credit where credit is due (I really didn't answer this question myself).

As to the first question, all three versions could be sped up by using a better algorithm. Specifically, create a list of the greatest common factors of the numbers 1-20 (2^4, 3^2, 5^1, 7^1, 11^1, 13^1, 17^1, 19^1) and multiply them out.

The far more interesting aspect is to understand the differences between the three languages using essentially the same algorithm. There are instances where a brute force algorithm such as this one can be helpful. So, why the performance difference?

For Java, one suggestion was to change the ArrayList to a primitive array of ints. This does decrease the running time, cutting about 0.5 - 1 second off (I just ran it this morning and it cut the running time from 4.386 seconds to 3.577 seconds. That cuts down a bit, but no one was able to come up with a way to bring it to under a half second (similar to the Scala version). This is surprising considering that all three compile down to java byte-code. There was a suggestion by @didierc to use an immutable iterator; I tested this suggestion, and it increased the running time to just over 5 seconds.

For Clojure, @mikera and @Webb give several suggestions to speed things up. They suggest to use loop/recur for fast iteration with two loop variables, unchecked-math for slightly faster maths operations (since we know there is no danger of overflow here), use primitive longs rather than boxed numbers, and avoid higher order functions like every?

Running the code of @mikera, I end up with a running time of 2.453 seconds, not quite as good as the scala code, but much better than my original version and better than the Java version:

(set! *unchecked-math* true)

(defn euler5 
  []
  (loop [n 1 
         d 2]
    (if (== 0 (unchecked-remainder-int n d))
      (if (>= d 20) n (recur n (inc d)))
      (recur (inc n) 2))))

(defn is-divisible-by-all?
  [number divisors]
  (= 0 (reduce + (map #(mod 2 %) divisors))))

For Scala, @didierc states that the range object 1 to 20 isn't actually a list of objects but rather one object. Very cool. Thus, the performance difference in Scala is that the we iterate over a single object instead of the list/array of integers 1-20.

In fact, if I change the helper function in the scala method from a range object to a list (see below), then the running time of the scala version increases from 0.302 seconds to 226.59 seconds.

private def isDivisibleByAll2(n: Int, top: Int): Boolean = {
    def divisors: List[Int] = List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
    divisors.forall(n % _ == 0)
  }

Thus, it appears that @didierc has correctly identified the advantage scala has in this instance. It would be interesting to know how this type of object might be implemented in java and clojure.

@didierc suggestion to improve the code by creating an ImmutableRange class, as follows:

import java.util.Iterator;
import java.lang.Iterable;

public class ImmutableRange implements Iterable<Integer> {

  class ImmutableRangeIterator implements Iterator<Integer> {
    private int counter, end, step;

    public ImmutableRangeIterator(int start_, int end_, int step_) {
      end = end_;
      step = step_;
      counter = start_;
    }

    public boolean hasNext(){
      if (step>0) return counter  <= end;
      else return counter >= end;
    }

    public Integer next(){
      int r = counter;
      counter+=step;
      return r;
    }

    public void remove(){
      throw new UnsupportedOperationException();
    }

  }

  private int start, end, step;

  public ImmutableRange(int start_, int end_, int step_){
    // fix-me: properly check for parameters consistency
    start = start_;
    end = end_;
    step = step_;
  }

  public Iterator<Integer> iterator(){
    return new ImmutableRangeIterator(start,end,step);
  }
}

did not improve the running time. The java version ran at 5.097 seconds on my machine. Thus, at the end, we have a satisfactory answer as to why the Scala version performs better, we understand how to improve the performance of the Clojure version, but what is missing would be to understand how to implement a Scala's immutable range object in Java.

FINAL THOUGHTS

As several have commented, the the most effective way to improve the running time of this code is to use a better algorithm. For example, the following java code computes the answer in less than 1 millisecond using the Sieve of Eratosthenes and Trial Division :

/**
 * Smallest Multiple
 *
 * 2520 is the smallest number that can be divided by each of the numbers 
 * from 1 to 10 without any remainder. What is the smallest positive number
 * that is evenly divisible by all of the numbers from 1 to 20?
 *
 * User: Alexandros Bantis
 * Date: 1/29/13
 * Time: 7:06 PM
 */
public class Problem005 {

  final private static int CROSSED_OUT = 0;
  final private static int NOT_CROSSED_OUT = 1;

  private static int intPow(int base, int exponent) {
    int value = 1;
    for (int i = 0; i < exponent; i++)
      value *= base;
    return value;
  }

  /**
   * primesTo computes all primes numbers up to n using trial by 
   * division algorithm
   *
   * @param n designates primes should be in the range 2 ... n
   * @return int[] a sieve of all prime factors 
   *              (0=CROSSED_OUT, 1=NOT_CROSSED_OUT)
   */
  private static int[] primesTo(int n) {
    int ceiling = (int) Math.sqrt(n * 1.0) + 1;
    int[] sieve = new int[n+1];

    // set default values
    for (int i = 2; i <= n; i++)
      sieve[i] = NOT_CROSSED_OUT;

    // cross out sieve values
    for (int i = 2; i <= ceiling; i++)
      for (int j = 2; i*j <= n; j++)
        sieve[i*j] = CROSSED_OUT;
    return sieve;
  }


  /**
   * getPrimeExp computes a prime factorization of n
   *
   * @param n the number subject to prime factorization
   * @return int[] an array of exponents for prime factors of n
   *               thus 8 => (0^0, 1^0, 2^3, 3^0, 4^0, 5^0, 6^0, 7^0, 8^0)
   */
  public static int[] getPrimeExp(int n) {
    int[] factor = primesTo(n);
    int[] primePowAll = new int[n+1];

    // set prime_factor_exponent for all factor/exponent pairs
    for (int i = 2; i <= n; i++) {
      if (factor[i] != CROSSED_OUT) {
        while (true) {
          if (n % i == 0) {
          n /= i;
          primePowAll[i] += 1;
          } else {
            break;
          }
        }
      }
    }

    return primePowAll;
  }

  /**
   * findSmallestMultiple computes the smallest number evenly divisible 
   * by all numbers 1 to n
   *
   * @param n the top of the range
   * @return int evenly divisible by all numbers 1 to n
   */
  public static int findSmallestMultiple(int n) {
    int[] gcfAll = new int[n+1];

    // populate greatest common factor arrays
    int[] gcfThis = null;
    for (int i = 2; i <= n; i++) {
      gcfThis = getPrimeExp(i);
      for (int j = 2; j <= i; j++) {
        if (gcfThis[j] > 0 && gcfThis[j] > gcfAll[j]) {
          gcfAll[j] = gcfThis[j];
        }
      }
    }

    // multiply out gcf arrays
    int value = 1;
    for (int i = 2; i <= n; i++) {
      if (gcfAll[i] > 0)
        value *= intPow(i, gcfAll[i]);
    }
    return value;
  }
}

Answer 1

Scala is faster because the other solutions create explicit collections for no reason. In Scala, 1 to top creates an object that represents the numbers from 1 to top but doesn't explicitly list them anywhere. In Java, you do explicitly create the list--and it's a lot faster to create one object than an array of 20 (actually 21 objects, since ArrayList is also an object) every iteration.

(Note that none of the versions are actually anywhere near optimal. See "least common multiple", which is what Eastsun is doing without mentioning it.)

Answer 2

Here's a much faster version in Clojure:

(set! *unchecked-math* true)

(defn euler5 []
  (loop [n 1 
         d 2)]
    (if (== 0 (unchecked-remainder-int n d))
      (if (>= d 20) n (recur n (inc d)))
      (recur (inc n) 2))))

(time (euler5))
=> "Elapsed time: 2438.761237 msecs"

ie it is around the same speed as your Java version.

The key tricks are:

use loop/recur for fast iteration with two loop variables
use unchecked-math for slightly faster maths operations (since we know there is no danger of overflow here)
use primitive longs rather than boxed numbers
avoid higher order functions like every? - they have a higher overhead than the low level operations

Obviously, if you really care about speed you would pick a better algorithm :-)

Answer 3

The first thing I noticed that will probably have some impact on speed in the Java version is that you're creating an ArrayList<Integer> instead of an int[] .

Java has a feature since version 5 that will automatically convert between an Integer and int - you're iterating over this list, treating them as int type in your comparisons and math calculations, which forces Java to spend a lot of cycles converting between the two types. Replacing your ArrayList<Integer> with an int[] will probably have some performance impact.

My first instinct when looking at your timings is to verify all are giving correct results. I assume you've properly tested all three to make sure the faster Scala version is indeed giving you correct results.

It doesn't seem related to the choice of algorithm for solving it since the strategy looks the same in all three (I'm not familiar with Clojure or Scala, so I might be missing on some subtle difference). Perhaps Scala is able to internally optimize this particular loop/algorithm, yielding much faster results?

Answer 4

On my painfully slow computer, the Clojure code takes nearly 10 minutes, so I am running about 20x slower on old faithful here.

user=> (time (smallest-multiple-of-1-to-n 20))
"Elapsed time: 561420.259 msecs"
232792560

You might be able to make this same algorithm more comparable with the others by avoiding laziness, using type hints / primitives / unchecked operations, etc. The Clojure code is boxing primitives for the anonymous function and creating/realizing a lazy sequence for range each iteration of the loop . This overhead is usually negligible, but here it is being looped hundreds of millions of times. The following non-idiomatic code gives a 3x speed-up.

(defn smallest-multiple-of-1-to-n [n]
  (loop [c (int n)] 
    (if 
      (loop [x (int 2)]
        (cond (pos? (unchecked-remainder-int c x)) false
              (>= x n) true
              :else (recur (inc x))))
      c (recur (inc c)))))

user=> (time (smallest-multiple-of-1-to-n 20))
"Elapsed time: 171921.80347 msecs"
232792560

You could continue to tinker with this and probably get even closer, but it is better to think through the algorithm instead and do better than iterating from 20 to ~200 million.

(defn gcd [a b]
  (if (zero? b) a (recur b (mod a b))))

(defn lcm 
  ([a b] (* b (quot a (gcd a b))))
  ([a b & r] (reduce lcm (lcm a b) r)))

user=> (time (apply lcm (range 2 21)))
"Elapsed time: 0.268749 msecs"
232792560

So even on my ancient machine, this is over 1000x faster than any implementation of your algorithm on your quick machine. I noticed that a gcd/lcm fold solution was posted for Scala as well. So, it would interesting to compare speeds of these similar algorithms.

Answer 5

Follow your algorithm, The clojure is about 10 times slower than java version.

A bit faster for the clojure version: 46555ms => 23846ms

(defn smallest-multiple-of-1-to-n
 [n]
  (let [divisors (range 2 (inc n))]
   (loop [i n]
     (if (loop [d 2]
        (cond (> d n) true
              (not= 0 (mod i d)) false
              :else (recur (inc d))))
     i
     (recur (inc i))))))

A bit faster for the Java version: 3248ms => 2757ms

private static int[] divisors;

private static void initializeDivisors(int ceiling) {
    divisors = new int[ceiling];

    for (Integer i = 1; i <= ceiling; i++)
        divisors[i - 1] = i;
}

Answer 6

First of all, if a number is divisible by, for example, 4, it is also divisible by 2 (one of 4's factors).

So, from 1-20, you only need to check some of the numbers, not all of them.

Secondly, if you can prime factorize the numbers, this is simply asking you for the lowest common multiplier (that's another way to approach this problem). In fact, you could probably do it with pen and paper since its only 1-20.

The algorithm that you're working with is fairly naive - it doesn't use the information that the problem is providing you with to its full extent.

Answer 7

Here is a more efficient solution in scala:

def smallestMultipe(n: Int): Int = {
  @scala.annotation.tailrec
  def gcd(x: Int, y: Int): Int = if(x == 0) y else gcd(y%x, x)
  (1 to n).foldLeft(1){ (x,y) => x/gcd(x,y)*y }
}

And I doublt why your scala version of Problem 1 is so un-efficient. Here are two possible solution of Problem 1 in Scala:

A short one:

(1 until 1000) filter (n => n%3 == 0 || n%5 == 0) sum

A more efficient one:

(1 until 1000).foldLeft(0){ (r,n) => if(n%3==0||n%5==0) r+n else r }

Answer 8

The problem is not about boxing, laziness, list, vectors, etc. The problem is about the algorithm. Of course, the solution is "brute force", but it's about the proportion of "brute" in "force".

First, in Euler Problem 5, we are not asked to check divisibility by 1 to n : just one to twenty. That said: Second, the solution must be a multiple of 38. Third, the prime numbers must be checked first and all divisors must be checked in descending order, to fail as soon as possible. Fourth, some divisors also ensure other divisors, ie if a number is divisible by 18, it is also divisible by 9, 6 and 3. Finally, all numbers are divisible by 1.

This solution in Clojure runs in a negligible time of 410 ms on a MacBook Pro i7:

;Euler 5 helper
(defn divisible-by-all [n]
  (let [divisors [19 17 13 11 20 18 16 15 14 12]
        maxidx (dec (count divisors))]
  (loop [idx 0]
     (let [result (zero? (mod n (nth divisors idx)))]
       (cond
          (and (= idx maxidx) (true? result)) true
          (false? result) false
          :else (recur (inc idx)))))))

;Euler 5 solution
(defn min-divisible-by-one-to-twenty []
  (loop[ x 38 ] ;this one can be set MUCH MUCH higher...
    (let [result (divisible-by-all x)]
       (if (true? result) x (recur (+ x 38))))))

user=>(time (min-divisible-by-one-to-twenty))
"Elapsed time: 410.06 msecs"

Answer 9

I believe this is the fastest pure Java code you could write for that problem and naive algorithm. It is faster than Scala.

public class Euler5 {
  public static void main(String[] args) {
    int test = 2520;
    int i;
    again: while (true) {
      test++;
      for (i = 20; i >1; i--) {
        if (test % i != 0)
          continue again;
      }
      break;
    }
    System.out.println(test);
  }
}

A couple of little details:

We can start testing at 2520 since the question mentioned it as a value :)
It seemed to me like we'd fail faster at the top of the range than at the bottom - I mean, how many things are divisible by 19 vs say, 3?
I used a label for the continue statement. This is basically a cheap, synthetic way to both reset the for loop and increment our test case.

What can I do to speed up this code?

Question

Running Times (in seconds)

Java Version of Problem #5

Scala Version of Problem #5

Clojure Verson of Problem #5

EDIT

FINAL THOUGHTS

9 answers

solution1
5 2013-02-03 19:04:50

solution2
5 ACCPTED 2013-02-04 04:11:34

solution3
4 2013-02-03 02:08:38

solution4
4 2013-02-03 05:23:09

solution5
2 2013-02-03 04:11:25

solution6
1 2013-02-03 00:47:14

solution7
1 2013-02-03 02:08:24

solution8
1 2013-02-04 02:44:06

solution9
1 2013-02-05 02:40:38

What can I do to speed up this code?

Question

Running Times (in seconds)

Java Version of Problem #5

Scala Version of Problem #5

Clojure Verson of Problem #5

EDIT

FINAL THOUGHTS

9 answers

solution1 5 2013-02-03 19:04:50

solution2 5 ACCPTED 2013-02-04 04:11:34

solution3 4 2013-02-03 02:08:38

solution4 4 2013-02-03 05:23:09

solution5 2 2013-02-03 04:11:25

solution6 1 2013-02-03 00:47:14

solution7 1 2013-02-03 02:08:24

solution8 1 2013-02-04 02:44:06

solution9 1 2013-02-05 02:40:38

solution1
5 2013-02-03 19:04:50

solution2
5 ACCPTED 2013-02-04 04:11:34

solution3
4 2013-02-03 02:08:38

solution4
4 2013-02-03 05:23:09

solution5
2 2013-02-03 04:11:25

solution6
1 2013-02-03 00:47:14

solution7
1 2013-02-03 02:08:24

solution8
1 2013-02-04 02:44:06

solution9
1 2013-02-05 02:40:38