[英]JAVA Matrix-Vector-Multiplication 100 times slower than C-Version
im working on performance-differences between Android JAVA- and Android NDK-applications. 我正在研究Android JAVA和Android NDK应用程序之间的性能差异。 I performed a Matrix4D-Vector4D Transformation on more than 90000 vertices as an example for 3D Graphics.
作为3D图形的示例,我在超过90000个顶点上执行了Matrix4D-Vector4D转换。
It seemes, that the JAVA Version is nearly 100 times slower than the C-Version. 看来,JAVA版本比C版本慢了近100倍 。 Did i something wrong?
我有什么问题吗? Does anyone have similar experiences?
有没有类似的经历?
my Java-Code for transformation: 我的Java代码进行转换:
long t1 = System.nanoTime();
for ( int i = 0; i < vCount; i++)
{
Vector4 vOut = new Vector4();
Vector4 v = vertices[i];
vOut.v_[0] = v.v_[0] * matrix[0].v_[0];
vOut.v_[1] = v.v_[0] * matrix[0].v_[1];
vOut.v_[2] = v.v_[0] * matrix[0].v_[2];
vOut.v_[3] = v.v_[0] * matrix[0].v_[3];
vOut.v_[0] += v.v_[1] * matrix[1].v_[0];
vOut.v_[1] += v.v_[1] * matrix[1].v_[1];
vOut.v_[2] += v.v_[1] * matrix[1].v_[2];
vOut.v_[3] += v.v_[1] * matrix[1].v_[3];
vOut.v_[0] += v.v_[2] * matrix[2].v_[0];
vOut.v_[1] += v.v_[2] * matrix[2].v_[1];
vOut.v_[2] += v.v_[2] * matrix[2].v_[2];
vOut.v_[3] += v.v_[2] * matrix[2].v_[3];
vOut.v_[0] += v.v_[3] * matrix[3].v_[0];
vOut.v_[1] += v.v_[3] * matrix[3].v_[1];
vOut.v_[2] += v.v_[3] * matrix[3].v_[2];
vOut.v_[3] += v.v_[3] * matrix[3].v_[3];
vertices[i] = vOut;
}
long t2 = System.nanoTime();
long diff = t2 - t1;
double ms = (double)(diff / 1000000.0f);
Log.w("GL2JNIView", String.format("ms %.2f ", ms));
Performance (Transform > 90 000 Vertices | Android 4.0.4 SGS II): (Median-value of 200 runs) 性能(转化> 90 000顶点| Android 4.0.4 SGS II):(平均200次运行)
JAVA-Version: 2 FPS
C-Version: 190 FPS
You create a new Vector4 in each Iteration. 您在每个迭代中创建一个新的Vector4。 From my own experience using new inside loops can cause unexpected performance problems in Android.
根据我自己的经验,使用新的内部循环会在Android中导致意外的性能问题。
AFAIK, Android Java implementation is thru a virtual machine called Dalvik which has a different instruction set than the JVM and does not use any just-in-time compilation techniques to dynamically translate some bytecodes to machine code, but just interpret them. AFAIK,Android Java实现是通过一个名为Dalvik的虚拟机实现的,该虚拟机具有与JVM不同的指令集,并且不使用任何即时编译技术将某些字节码动态转换为机器代码,而只是对其进行解释。 So Dalvik is obviously slower on CPU bound tasks rthan C.
因此,Dalvik在CPU绑定任务上显然比C慢。
This might change in very recent Android systems. 在最近的Android系统中,这可能会改变。
You also should change your loop. 您还应该更改循环。 In addition to the answer by @toopok4k3 you should try these things:
除了@ toopok4k3的答案,您还应该尝试以下操作:
I am assuming the values are doubles in the version below. 我假设值在以下版本中为双精度。
int i = 0;
try
{
Vector4 vOut = new Vector4();
final double m0v0 = matrix[0].v_[0];
final double m0v1 = matrix[0].v_[1];
final double m0v2 = matrix[0].v_[2];
final double m0v3 = matrix[0].v_[3];
final double m1v0 = matrix[1].v_[0];
final double m1v1 = matrix[1].v_[1];
final double m1v2 = matrix[1].v_[2];
final double m1v3 = matrix[1].v_[3];
final double m2v0 = matrix[2].v_[0];
final double m2v1 = matrix[2].v_[1];
final double m2v2 = matrix[2].v_[2];
final double m2v3 = matrix[2].v_[3];
final double m3v0 = matrix[3].v_[0];
final double m3v1 = matrix[3].v_[1];
final double m3v2 = matrix[3].v_[2];
final double m3v3 = matrix[3].v_[3];
while (true)
{
Vector4 v = vertices[i];
i++;
double vertexVal = v.v_[0];
vOut.v_[0] = vertexVal * m0v0;
vOut.v_[1] = vertexVal * m0v1;
vOut.v_[2] = vertexVal * m0v2;
vOut.v_[3] = vertexVal * m0v3;
vertexVal = v.v_[1];
vOut.v_[0] += vertexVal * m1v0;
vOut.v_[1] += vertexVal * m1v1;
vOut.v_[2] += vertexVal * m1v2;
vOut.v_[3] += vertexVal * m1v3;
vertexVal = v.v_[2];
vOut.v_[0] += vertexVal * m2v0;
vOut.v_[1] += vertexVal * m2v1;
vOut.v_[2] += vertexVal * m2v2;
vOut.v_[3] += vertexVal * m2v3;
vertexVal = v.v_[3];
vOut.v_[0] += vertexVal * m3v0;
vOut.v_[1] += vertexVal * m3v1;
vOut.v_[2] += vertexVal * m3v2;
vOut.v_[3] += vertexVal * m3v3;
vertices[i] = vOut;
}
}
catch (ArrayIndexOutOfBoundsException aioobe)
{
// loop is done
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.