为什么我的Julia实现在3D中计算欧几里得距离比我的C实现要快

Question

I am comparing the time it takes Julia to compute the Euclidean distances between two sets of points in 3D space against an equivalent implementation in C. I was very surprised to observe that ( for this particular case and my particular implementations ) Julia is 22% faster than C. When I also included @fastmath in the Julia version, it would be even 83% faster than C. 我正在比较Julia在3D空间中计算两组点之间的欧几里得距离与C中的等效实现所花费的时间。我非常惊讶地发现， 对于这种特殊情况和我的特定实现， Julia的速度要快22％比C.当我还在Julia版本中包含@fastmath时，它甚至比C快83％。

This leads to my question: why? 这引出我的问题：为什么？ Either Julia is more amazing than I originally thought or I am doing something very inefficient in C. I am betting my money on the latter. 朱莉娅比我最初想象的更令人惊讶，或者我在C语言上做的效率很低。我把钱押在了C语言上。

Some particulars about the implementation: 有关实现的一些细节：

In Julia I use 2D arrays of Float64 . 在Julia中，我使用Float64 2D数组。
In CI use dynamically allocated 1D arrays of double . 在CI中，使用double动态分配的一维数组。
In CI use the sqrt function from math.h . 在CI中，使用math.h的sqrt函数。
The computations are very fast, therefore I compute them a 1000 times to avoid comparing on the micro/millisecond level. 计算速度非常快，因此我进行了1000次计算以避免在微秒级进行比较。

Some particulars about the compilation: 有关编译的一些细节：

Compiler: gcc 5.4.0 编译器：gcc 5.4.0
Optimisation flags: -O3 -ffast-math 优化标志： -O3 -ffast-math

Timings: 时间：

Julia (without @fastmath ): 90 s 朱莉娅（没有@fastmath ）：90 s
Julia (with @fastmath ): 20 s 朱莉娅（ @fastmath ）：20秒
C: 116 s C：116秒
I use the bash command time for the timings 我将bash命令time用于计时
- $ time ./particleDistance.jl (with shebang in file) $ time ./particleDistance.jl （文件中包含shebang）
- $ time ./particleDistance

particleDistance.jl 粒子距离

#!/usr/local/bin/julia

function distance!(x::Array{Float64, 2}, y::Array{Float64, 2}, r::Array{Float64, 2})
    nx = size(x, 1)
    ny = size(y, 1)

    for k = 1:1000

        for j = 1:ny

            @fastmath for i = 1:nx
                @inbounds dx = y[j, 1] - x[i, 1]
                @inbounds dy = y[j, 2] - x[i, 2]
                @inbounds dz = y[j, 3] - x[i, 3]

                rSq = dx*dx + dy*dy + dz*dz

                @inbounds r[i, j] = sqrt(rSq)
            end

        end

    end

end

function main()
    n = 4096
    m = 4096

    x = rand(n, 3)
    y = rand(m, 3)
    r = zeros(n, m)

    distance!(x, y, r)

    println("r[n, m] = $(r[n, m])")
end

main()

particleDistance.c 粒子距离

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

void distance(int n, int m, double* x, double* y, double* r)
{
    int i, j, I, J;
    double dx, dy, dz, rSq;

    for (int k = 0; k < 1000; k++)
    {
        for (j = 0; j < m; j++)
        {
            J = 3*j;

            for (i = 0; i < n; i++)
            {
                I = 3*i;

                dx = y[J] - x[I];
                dy = y[J+1] - x[I+1];
                dz = y[J+2] - x[I+2];

                rSq = dx*dx + dy*dy + dz*dz;

                r[j*n+i] = sqrt(rSq);
            }
        }
    }
}

int main()
{
    int i;
    int n = 4096;
    int m = 4096;

    double *x, *y, *r;

    size_t xbytes = 3*n*sizeof(double);
    size_t ybytes = 3*m*sizeof(double);

    x = (double*) malloc(xbytes);
    y = (double*) malloc(ybytes);
    r = (double*) malloc(xbytes*ybytes/9);

    for (i = 0; i < 3*n; i++)
    {
        x[i] = (double) rand()/RAND_MAX*2.0-1.0;
    }

    for (i = 0; i < 3*m; i++)
    {
        y[i] = (double) rand()/RAND_MAX*2.0-1.0;
    }

    distance(n, m, x, y, r);

    printf("r[n*m-1] = %f\n", r[n*m-1]);

    free(x);
    free(y);
    free(r);

    return 0;
}

Makefile 生成文件

all: particleDistance.c
    gcc -o particleDistance particleDistance.c -O3 -ffast-math -lm

Answer 1

Maybe it should be a comment, but the point is that Julia is indeed pretty optimized. 也许应该是一条评论，但重点是朱莉娅确实非常优化。 In the Julia web page you can see that it can beat C in some cases (mandel). 在Julia网站上，您可以看到它在某些情况下可以击败C（mandel）。

I see that you are using -ffast-math in your compilation. 我看到您在编译中使用-ffast-math。 But, maybe you could do some optimizations in your code (although nowadays compilers are pretty smart and this might not solve the issue). 但是，也许您可以对代码进行一些优化（尽管如今编译器非常聪明，但这可能无法解决问题）。

Instead of using int for your indexes, try to use unsigned int, this allows you to maybe try the following thing; 尝试使用unsigned int而不是对索引使用int，这可以让您尝试以下操作；
Instead of multiply by 3, if you use an unsigned you can do a shift and add. 如果您使用无符号，则可以乘以3，而不是乘以3。 This can save some computation time; 这样可以节省一些计算时间。
In accessing the elements like x[J], maybe try using pointers directly and access the elements in a sequential manner like x+=3 (?); 在访问像x [J]这样的元素时，也许尝试直接使用指针并以类似x + = 3（？）的顺序方式访问元素；
Instead of int n and int m, try to set them as macros. 尝试将它们设置为宏，而不要使用int n和int m。 If they are known in advance, you can take advantage of that. 如果事先知道它们，您可以利用它。
Does the malloc make difference in this case? 在这种情况下，malloc是否有所作为？ If n and m are known, fixed size arrays would reduce the time spent for the OS allocate memory. 如果已知n和m，则固定大小的数组将减少OS分配内存所花费的时间。

There might be a few other things, but Julia is pretty optimized with real time compilation, so everything that is constant and is known in advance is used in favor of it. 可能还有其他一些事情，但是Julia通过实时编译进行了相当大的优化，因此使用了所有常量且预先知道的东西来代替它。 I have tried Julia with no regrets. 我毫不后悔地尝试了朱莉娅。

Answer 2

Your index calculation in C is rather slow 您在C中的索引计算速度很慢

Try something like the following (I did not compiled it, it may have still errors, just too visualize the idea): 尝试类似以下的操作（我没有编译它，它可能仍然有错误，只是过于形象化了）：

void distance(int n, int m, double* x, double* y, double* r)
{
int i, j;
double dx, dy, dz, rSq;
double* X, *Y, *R;


for (int k = 0; k < 1000; k++)
{
    R = r;
    Y = y;
    for (j = 0; j < m; j++)
    {
        X = x;

        for (i = 0; i < n; i++)
        {
            dx = Y[0] - *X++;
            dy = Y[1] - *X++;
            dz = Y[2] - *X++;

            rSq = dx*dx + dy*dy + dz*dz;

            *R++ = sqrt(rSq);
        }
        Y += 3;
    }
}
}

Alternatively you could try, it might be a little bit faster (one increment instead of 3) 另外，您也可以尝试，它可能会快一点（增加一个而不是3个）

            dx = Y[0] - X[0];
            dy = Y[1] - X[1];
            dz = Y[2] - X[2];
            X+=3;

Y[x] is the same as *(Y+x). Y [x]与*（Y + x）相同。

Good luck 祝好运

为什么我的Julia实现在3D中计算欧几里得距离比我的C实现要快

问题描述

2 个解决方案

解决方案1
0 2017-07-16 07:29:14

解决方案2
0 2017-08-10 20:39:54

为什么我的Julia实现在3D中计算欧几里得距离比我的C实现要快

问题描述

2 个解决方案

解决方案1 0 2017-07-16 07:29:14

解决方案2 0 2017-08-10 20:39:54

解决方案1
0 2017-07-16 07:29:14

解决方案2
0 2017-08-10 20:39:54