Average brightness with Clojure very slow

Question

Being new to Clojure I would like to compute the average brightness of (lots of) jpg-images. To do so I load the image into memory using ImageIO/read from Java, extract the byte buffer behind it and apply an average.

(defn brightness
  "Computes the average brightness of an image."
  [^File file]
  (-> file
    ImageIO/read
    .getRaster
    .getDataBuffer
    .getData
    byteaverage))

Here, the average

(defn byteaverage
  [numbers]
  (/ (float
     (->> numbers
        (map bytetoint)
        (apply +)))
     (count numbers))
  )

needs to take into account that bytes are signed in Java and need to be converted to sufficiently large integers first.

(defn bytetoint
   [b]
   (bit-and b 0xFF)
  )

While this does give correct results, it is extremely slow. It takes around 10 to 20 seconds for 20 megapixel images. Disk access is not the problem. From playing around with time , the culprit seems to be the bytetoint conversion. Just mapping this bytetoint onto the byte array eats 8 GB of memory and does not terminate in the REPL.

Why is that and what could one do about it?

PS: I am aware that one could use other programming languages, libraries, multithreading or change the algorithm. My point is that the above Clojure code should be much faster and I would like to understand why it is not.

Answer 1

You are basically running lots of plumbing in a very tight loop, such as boxing, converting, using chuncked lazy sequences etc.. Lots of benefits that you get out of modern cpus flies right out the window; such as preloading cache lines, branch prediction etc.

This kind of loop (compute sum) is much better achieved in terms of a more direct form of computation, such as clojure loop construct, something in the form of:

(defn get-sum [^bytes data]
  (let [m (alength data)]
    (loop [idx 0 sum 0]
      (if (< idx m)
        (recur (inc idx) (unchecked-add sum (bit-and (aget data idx) 0xff)))
        (/ sum m)))))

This is untested so you might need to adapt it, but it shows a few things:

Using type hints array access
Using a direct loop which is very efficient
Using "Integer" (long) math for the actual loop, and dividing only at the end
Using unchecked-math which adds much to performance in "tight loops"

Edit

You could use other forms as well, which might perform even better, such as a dotimes with an internally mutable state (say a long vector of size 1) if you really need to squeeze performance out, but by then, you might as well write a little method in java ;)

Answer 2

in addition to @shlomi's answer:

you can also make it less verbose (and probably a bit faster) using areduce function:

(defn get-sum-2 [^bytes data]
  (/ (areduce data i res 0 
              (unchecked-add res (bit-and (aget data i) 0xff)))
     (alength data)))

Answer 3

If you would like to do it really fast in java then you can use these options (best would be to use all of them):

use java wrapper for libjpeg-turbo as a jpeg decompression library - it is 30 times faster than ImageIO...
Don't calculate average from all the pixels in image, use 1% to 10% percent of pixels evenly distributed on the image (use some hash function to choose pseudo-random pixels - or just jump in a for loop by more then one pixel, depending on how many pixels you would like to hit) - average calculated in this way is much faster. The more pixels you use, the more accurate results you get - but if you use 5% of evenly distributed selected pixels, it would be more then enough to get very good results.
Multithreading.
avoid using floating point calculations, use integer calculations - floating point calculations are just slower up to 3-4 times. where possible
Do not load all images into memory, as images often use much memory it could make a effect where Garbage Collector work hardly and your app just run slow because of that, better load them when they are needed and let them be GC-ed after that - calculate the average incrementally

As to negative byte values... Don't convert the color value to byte, convert it directly to int like:

int rgb = somePixelColor;
int b = rgb & 0xFF;
int g = (rgb>>8) & 0xFF;
int r = (rgb>>16) & 0xFF;

int sillyBrightness = (r + g + b)/3; // because each color should have a weight for calculating brightness, there are some models of that.

Answer 4

In addition to the above good information, you may be interested in the HipHip library which is designed for manipulating arrays of primitive values from Clojure: https://github.com/plumatic/hiphip

Here is an example from the README about computing mean & standard deviation of a primitive array:

(defn std-dev [xs]
  (let [mean (dbl/amean xs)
        square-diff-sum (dbl/asum [x xs] (Math/pow (- x mean) 2))]
    (/ square-diff-sum (dbl/alength xs))))

(defn covariance [xs ys]
  (let [ys-mean (dbl/amean ys)
        xs-mean (dbl/amean xs)
        diff-sum (dbl/asum [x xs y ys] (* (- x xs-mean) (- y ys-mean)))]
    (/ diff-sum (dec (dbl/alength xs)))))

(defn correlation [xs ys std-dev1 std-dev2]
  (/ (covariance xs ys) (* std-dev1 std-dev2)))

Average brightness with Clojure very slow

Question

4 answers

solution1
2 2017-04-21 07:55:23

Edit

solution2
1 2017-04-21 10:08:06

solution3
0 2017-04-21 07:12:52

solution4
0 2017-04-21 15:02:27

Average brightness with Clojure very slow

Question

4 answers

solution1 2 2017-04-21 07:55:23

Edit

solution2 1 2017-04-21 10:08:06

solution3 0 2017-04-21 07:12:52

solution4 0 2017-04-21 15:02:27

solution1
2 2017-04-21 07:55:23

solution2
1 2017-04-21 10:08:06

solution3
0 2017-04-21 07:12:52

solution4
0 2017-04-21 15:02:27