简体   繁体   中英

Julia matrix multiplication is slower than numpy's

I am trying to do some matrix multiplication in Julia to benchmark it against numpy's.

My Julia code is the following:

function myFunc()
  A = randn(10000, 10000)
  B = randn(10000, 10000)
  return A*B
end

myFunc()

And the python version is:

A = np.random.rand(10000,10000)
B = np.random.rand(10000,10000)
A*B

The Python version takes under 100ms to execute. The Julia version takes over 13s!! Seeing as they are using pretty much the same BLAS technololgy under the hood, what seems to be the problem with the Julia version?!

I don't think those are doing the same thing. The numpy expression just does an element-by-element multiplication, while the Julia expression does true matrix multiplication.

You can see the difference by using smaller inputs. Here's the numpy example:

>>> A
array([1, 2, 3])
>>> B
array([[1],
       [2],
       [3]])
>>> A * B
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])
>>> B * A
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Note that here we have broadcasting , which "simulates" the outer product of two vectors, and so you might think it's matrix multiplication. But it can't be, because matrix multiplication isn't commutative, and here (A * B) == (B * A) . Look what happens when you do the same thing in Julia:

julia> A = [1, 2, 3]
3-element Array{Int64,1}:
 1
 2
 3

julia> B = [1 2 3]
1x3 Array{Int64,2}:
 1  2  3

julia> A * B
3x3 Array{Int64,2}:
 1  2  3
 2  4  6
 3  6  9

julia> B * A
1-element Array{Int64,1}:
 14

Here, B * A gives you a proper dot product. Try numpy.dot if you want a true comparison.

If you're using Python 3.5 or higher, you can also use the new built-in dot product operator! Just make sure the shapes of the matrices are aligned:

>>> A
array([[1, 2, 3]])
>>> B
array([[1],
       [2],
       [3]])
>>> A @ B
array([[14]])
>>> B @ A
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Naive matrix multiplication takes on the order of N^3 operations. You can do a simple benchmark to see this growth:

function myFunc(N)
    A = rand(N, N)
    B = rand(N, N)

    A*B
end

myFunc(1)   # run once to compile

sizes = [floor(Int, x) for x in logspace(1, 3.5, 50)]

times = [@elapsed(myFunc(n)) for n in sizes]

using PyPlot

loglog(sizes, times, "o-")

To do this more seriously, I would average over several runs at each size. I get something like the following graph. 在此处输入图片说明 Indeed, extrapolating to N=10^4 gives something around 20 or 30 seconds on my computer. (Again, more seriously I would fit a straight line to the log-log plot to do the extrapolation.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM