[英]Why is my Julia implementation for computing Euclidean distances in 3D faster than my C implementation
I am comparing the time it takes Julia to compute the Euclidean distances between two sets of points in 3D space against an equivalent implementation in C. I was very surprised to observe that ( for this particular case and my particular implementations ) Julia is 22% faster than C. When I also included @fastmath
in the Julia version, it would be even 83% faster than C. 我正在比较Julia在3D空间中计算两组点之间的欧几里得距离与C中的等效实现所花费的时间。我非常惊讶地发现, 对于这种特殊情况和我的特定实现, Julia的速度要快22%比C.当我还在Julia版本中包含
@fastmath
时,它甚至比C快83%。
This leads to my question: why? 这引出我的问题:为什么? Either Julia is more amazing than I originally thought or I am doing something very inefficient in C. I am betting my money on the latter.
朱莉娅比我最初想象的更令人惊讶, 或者我在C语言上做的效率很低。我把钱押在了C语言上。
Some particulars about the implementation: 有关实现的一些细节:
Float64
. Float64
2D数组。 double
. double
动态分配的一维数组。 sqrt
function from math.h
. math.h
的sqrt
函数。 Some particulars about the compilation: 有关编译的一些细节:
-O3 -ffast-math
-O3 -ffast-math
Timings: 时间:
@fastmath
): 90 s @fastmath
):90 s @fastmath
): 20 s @fastmath
):20秒 time
for the timings time
用于计时
$ time ./particleDistance.jl
(with shebang in file) $ time ./particleDistance.jl
(文件中包含shebang) $ time ./particleDistance
particleDistance.jl 粒子距离
#!/usr/local/bin/julia
function distance!(x::Array{Float64, 2}, y::Array{Float64, 2}, r::Array{Float64, 2})
nx = size(x, 1)
ny = size(y, 1)
for k = 1:1000
for j = 1:ny
@fastmath for i = 1:nx
@inbounds dx = y[j, 1] - x[i, 1]
@inbounds dy = y[j, 2] - x[i, 2]
@inbounds dz = y[j, 3] - x[i, 3]
rSq = dx*dx + dy*dy + dz*dz
@inbounds r[i, j] = sqrt(rSq)
end
end
end
end
function main()
n = 4096
m = 4096
x = rand(n, 3)
y = rand(m, 3)
r = zeros(n, m)
distance!(x, y, r)
println("r[n, m] = $(r[n, m])")
end
main()
particleDistance.c 粒子距离
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
void distance(int n, int m, double* x, double* y, double* r)
{
int i, j, I, J;
double dx, dy, dz, rSq;
for (int k = 0; k < 1000; k++)
{
for (j = 0; j < m; j++)
{
J = 3*j;
for (i = 0; i < n; i++)
{
I = 3*i;
dx = y[J] - x[I];
dy = y[J+1] - x[I+1];
dz = y[J+2] - x[I+2];
rSq = dx*dx + dy*dy + dz*dz;
r[j*n+i] = sqrt(rSq);
}
}
}
}
int main()
{
int i;
int n = 4096;
int m = 4096;
double *x, *y, *r;
size_t xbytes = 3*n*sizeof(double);
size_t ybytes = 3*m*sizeof(double);
x = (double*) malloc(xbytes);
y = (double*) malloc(ybytes);
r = (double*) malloc(xbytes*ybytes/9);
for (i = 0; i < 3*n; i++)
{
x[i] = (double) rand()/RAND_MAX*2.0-1.0;
}
for (i = 0; i < 3*m; i++)
{
y[i] = (double) rand()/RAND_MAX*2.0-1.0;
}
distance(n, m, x, y, r);
printf("r[n*m-1] = %f\n", r[n*m-1]);
free(x);
free(y);
free(r);
return 0;
}
Makefile 生成文件
all: particleDistance.c
gcc -o particleDistance particleDistance.c -O3 -ffast-math -lm
Maybe it should be a comment, but the point is that Julia is indeed pretty optimized. 也许应该是一条评论,但重点是朱莉娅确实非常优化。 In the Julia web page you can see that it can beat C in some cases (mandel).
在Julia网站上,您可以看到它在某些情况下可以击败C(mandel)。
I see that you are using -ffast-math in your compilation. 我看到您在编译中使用-ffast-math。 But, maybe you could do some optimizations in your code (although nowadays compilers are pretty smart and this might not solve the issue).
但是,也许您可以对代码进行一些优化(尽管如今编译器非常聪明,但这可能无法解决问题)。
There might be a few other things, but Julia is pretty optimized with real time compilation, so everything that is constant and is known in advance is used in favor of it. 可能还有其他一些事情,但是Julia通过实时编译进行了相当大的优化,因此使用了所有常量且预先知道的东西来代替它。 I have tried Julia with no regrets.
我毫不后悔地尝试了朱莉娅。
Your index calculation in C is rather slow 您在C中的索引计算速度很慢
Try something like the following (I did not compiled it, it may have still errors, just too visualize the idea): 尝试类似以下的操作(我没有编译它,它可能仍然有错误,只是过于形象化了):
void distance(int n, int m, double* x, double* y, double* r)
{
int i, j;
double dx, dy, dz, rSq;
double* X, *Y, *R;
for (int k = 0; k < 1000; k++)
{
R = r;
Y = y;
for (j = 0; j < m; j++)
{
X = x;
for (i = 0; i < n; i++)
{
dx = Y[0] - *X++;
dy = Y[1] - *X++;
dz = Y[2] - *X++;
rSq = dx*dx + dy*dy + dz*dz;
*R++ = sqrt(rSq);
}
Y += 3;
}
}
}
Alternatively you could try, it might be a little bit faster (one increment instead of 3) 另外,您也可以尝试,它可能会快一点(增加一个而不是3个)
dx = Y[0] - X[0];
dy = Y[1] - X[1];
dz = Y[2] - X[2];
X+=3;
Y[x] is the same as *(Y+x). Y [x]与*(Y + x)相同。
Good luck 祝好运
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.