Julia matrix multiplication is slower than numpy's

Question

I am trying to do some matrix multiplication in Julia to benchmark it against numpy's.

My Julia code is the following:

function myFunc()
  A = randn(10000, 10000)
  B = randn(10000, 10000)
  return A*B
end

myFunc()

And the python version is:

A = np.random.rand(10000,10000)
B = np.random.rand(10000,10000)
A*B

The Python version takes under 100ms to execute. The Julia version takes over 13s!! Seeing as they are using pretty much the same BLAS technololgy under the hood, what seems to be the problem with the Julia version?!

Answer 1

I don't think those are doing the same thing. The numpy expression just does an element-by-element multiplication, while the Julia expression does true matrix multiplication.

You can see the difference by using smaller inputs. Here's the numpy example:

>>> A
array([1, 2, 3])
>>> B
array([[1],
       [2],
       [3]])
>>> A * B
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])
>>> B * A
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Note that here we have broadcasting , which "simulates" the outer product of two vectors, and so you might think it's matrix multiplication. But it can't be, because matrix multiplication isn't commutative, and here (A * B) == (B * A) . Look what happens when you do the same thing in Julia:

julia> A = [1, 2, 3]
3-element Array{Int64,1}:
 1
 2
 3

julia> B = [1 2 3]
1x3 Array{Int64,2}:
 1  2  3

julia> A * B
3x3 Array{Int64,2}:
 1  2  3
 2  4  6
 3  6  9

julia> B * A
1-element Array{Int64,1}:
 14

Here, B * A gives you a proper dot product. Try numpy.dot if you want a true comparison.

If you're using Python 3.5 or higher, you can also use the new built-in dot product operator! Just make sure the shapes of the matrices are aligned:

>>> A
array([[1, 2, 3]])
>>> B
array([[1],
       [2],
       [3]])
>>> A @ B
array([[14]])
>>> B @ A
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Answer 2

Naive matrix multiplication takes on the order of N^3 operations. You can do a simple benchmark to see this growth:

function myFunc(N)
    A = rand(N, N)
    B = rand(N, N)

    A*B
end

myFunc(1)   # run once to compile

sizes = [floor(Int, x) for x in logspace(1, 3.5, 50)]

times = [@elapsed(myFunc(n)) for n in sizes]

using PyPlot

loglog(sizes, times, "o-")

To do this more seriously, I would average over several runs at each size. I get something like the following graph. Indeed, extrapolating to N=10^4 gives something around 20 or 30 seconds on my computer. (Again, more seriously I would fit a straight line to the log-log plot to do the extrapolation.)

Julia matrix multiplication is slower than numpy's

Question

2 answers

solution1
12 ACCPTED 2015-12-11 12:07:43

solution2
3 2015-12-11 12:23:39

Julia matrix multiplication is slower than numpy's

Question

2 answers

solution1 12 ACCPTED 2015-12-11 12:07:43

solution2 3 2015-12-11 12:23:39

solution1
12 ACCPTED 2015-12-11 12:07:43

solution2
3 2015-12-11 12:23:39