Being new to Clojure I would like to compute the average brightness of (lots of) jpg-images. To do so I load the image into memory using ImageIO/read
from Java, extract the byte buffer behind it and apply an average.
(defn brightness
"Computes the average brightness of an image."
[^File file]
(-> file
ImageIO/read
.getRaster
.getDataBuffer
.getData
byteaverage))
Here, the average
(defn byteaverage
[numbers]
(/ (float
(->> numbers
(map bytetoint)
(apply +)))
(count numbers))
)
needs to take into account that bytes are signed in Java and need to be converted to sufficiently large integers first.
(defn bytetoint
[b]
(bit-and b 0xFF)
)
While this does give correct results, it is extremely slow. It takes around 10 to 20 seconds for 20 megapixel images. Disk access is not the problem. From playing around with time
, the culprit seems to be the bytetoint
conversion. Just mapping this bytetoint
onto the byte array eats 8 GB of memory and does not terminate in the REPL.
Why is that and what could one do about it?
PS: I am aware that one could use other programming languages, libraries, multithreading or change the algorithm. My point is that the above Clojure code should be much faster and I would like to understand why it is not.
You are basically running lots of plumbing in a very tight loop, such as boxing, converting, using chuncked lazy sequences etc.. Lots of benefits that you get out of modern cpus flies right out the window; such as preloading cache lines, branch prediction etc.
This kind of loop (compute sum) is much better achieved in terms of a more direct form of computation, such as clojure loop
construct, something in the form of:
(defn get-sum [^bytes data]
(let [m (alength data)]
(loop [idx 0 sum 0]
(if (< idx m)
(recur (inc idx) (unchecked-add sum (bit-and (aget data idx) 0xff)))
(/ sum m)))))
This is untested so you might need to adapt it, but it shows a few things:
You could use other forms as well, which might perform even better, such as a dotimes
with an internally mutable state (say a long vector of size 1) if you really need to squeeze performance out, but by then, you might as well write a little method in java ;)
in addition to @shlomi's answer:
you can also make it less verbose (and probably a bit faster) using areduce
function:
(defn get-sum-2 [^bytes data]
(/ (areduce data i res 0
(unchecked-add res (bit-and (aget data i) 0xff)))
(alength data)))
If you would like to do it really fast in java then you can use these options (best would be to use all of them):
As to negative byte values... Don't convert the color value to byte, convert it directly to int like:
int rgb = somePixelColor;
int b = rgb & 0xFF;
int g = (rgb>>8) & 0xFF;
int r = (rgb>>16) & 0xFF;
int sillyBrightness = (r + g + b)/3; // because each color should have a weight for calculating brightness, there are some models of that.
In addition to the above good information, you may be interested in the HipHip library which is designed for manipulating arrays of primitive values from Clojure: https://github.com/plumatic/hiphip
Here is an example from the README about computing mean & standard deviation of a primitive array:
(defn std-dev [xs]
(let [mean (dbl/amean xs)
square-diff-sum (dbl/asum [x xs] (Math/pow (- x mean) 2))]
(/ square-diff-sum (dbl/alength xs))))
(defn covariance [xs ys]
(let [ys-mean (dbl/amean ys)
xs-mean (dbl/amean xs)
diff-sum (dbl/asum [x xs y ys] (* (- x xs-mean) (- y ys-mean)))]
(/ diff-sum (dec (dbl/alength xs)))))
(defn correlation [xs ys std-dev1 std-dev2]
(/ (covariance xs ys) (* std-dev1 std-dev2)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.