JAVA矩阵向量乘法比C版本慢100倍

Question

im working on performance-differences between Android JAVA- and Android NDK-applications. 我正在研究Android JAVA和Android NDK应用程序之间的性能差异。 I performed a Matrix4D-Vector4D Transformation on more than 90000 vertices as an example for 3D Graphics. 作为3D图形的示例，我在超过90000个顶点上执行了Matrix4D-Vector4D转换。

It seemes, that the JAVA Version is nearly 100 times slower than the C-Version. 看来，JAVA版本比C版本慢了近100倍 。 Did i something wrong? 我有什么问题吗？ Does anyone have similar experiences? 有没有类似的经历？

my Java-Code for transformation: 我的Java代码进行转换：

        long t1 = System.nanoTime();
        for ( int i = 0; i < vCount; i++)
        {

            Vector4 vOut = new Vector4();
            Vector4 v = vertices[i];

            vOut.v_[0] = v.v_[0] * matrix[0].v_[0];
            vOut.v_[1] = v.v_[0] * matrix[0].v_[1];
            vOut.v_[2] = v.v_[0] * matrix[0].v_[2];
            vOut.v_[3] = v.v_[0] * matrix[0].v_[3];

            vOut.v_[0] += v.v_[1] * matrix[1].v_[0];
            vOut.v_[1] += v.v_[1] * matrix[1].v_[1];
            vOut.v_[2] += v.v_[1] * matrix[1].v_[2];
            vOut.v_[3] += v.v_[1] * matrix[1].v_[3];

            vOut.v_[0] += v.v_[2] * matrix[2].v_[0];
            vOut.v_[1] += v.v_[2] * matrix[2].v_[1];
            vOut.v_[2] += v.v_[2] * matrix[2].v_[2];
            vOut.v_[3] += v.v_[2] * matrix[2].v_[3];

            vOut.v_[0] += v.v_[3] * matrix[3].v_[0];
            vOut.v_[1] += v.v_[3] * matrix[3].v_[1];
            vOut.v_[2] += v.v_[3] * matrix[3].v_[2];
            vOut.v_[3] += v.v_[3] * matrix[3].v_[3]; 

            vertices[i] = vOut;

        }
        long t2 = System.nanoTime();        
        long diff = t2 - t1;        
        double ms = (double)(diff / 1000000.0f);
        Log.w("GL2JNIView", String.format("ms %.2f ", ms));

Performance (Transform > 90 000 Vertices | Android 4.0.4 SGS II): (Median-value of 200 runs) 性能（转化> 90 000顶点| Android 4.0.4 SGS II）：（平均200次运行）

JAVA-Version:   2 FPS
C-Version:    190 FPS

Answer 1

You create a new Vector4 in each Iteration. 您在每个迭代中创建一个新的Vector4。 From my own experience using new inside loops can cause unexpected performance problems in Android. 根据我自己的经验，使用新的内部循环会在Android中导致意外的性能问题。

Answer 2

AFAIK, Android Java implementation is thru a virtual machine called Dalvik which has a different instruction set than the JVM and does not use any just-in-time compilation techniques to dynamically translate some bytecodes to machine code, but just interpret them. AFAIK，Android Java实现是通过一个名为Dalvik的虚拟机实现的，该虚拟机具有与JVM不同的指令集，并且不使用任何即时编译技术将某些字节码动态转换为机器代码，而只是对其进行解释。 So Dalvik is obviously slower on CPU bound tasks rthan C. 因此，Dalvik在CPU绑定任务上显然比C慢。

This might change in very recent Android systems. 在最近的Android系统中，这可能会改变。

Answer 3

You also should change your loop. 您还应该更改循环。 In addition to the answer by @toopok4k3 you should try these things: 除了@ toopok4k3的答案，您还应该尝试以下操作：

Dump the for loop and just catch an ArrayIndexOutOfBounds exception. 转储for循环并仅捕获ArrayIndexOutOfBounds异常。 You have a large enough loop to make up for the overhead of the try/catch. 您有足够大的循环来弥补try / catch的开销。
If the matrix array and the values they contain aren't changing from one loop iteration to the next then assign them to constants outside of the loop. 如果矩阵数组及其包含的值从一个循环迭代到下一个循环没有变化，则将它们分配给循环外的常量。 De-referencing arrays and accessing member variables aren't nearly as fast as local variables. 取消引用数组和访问成员变量的速度不如局部变量快。
Since v.v_[] is used several times, assign it to a local variable and use it 4 times before getting the next one. 由于v.v_ []已使用多次，因此将其分配给局部变量并使用4次，然后再获取下一个变量。

I am assuming the values are doubles in the version below. 我假设值在以下版本中为双精度。

int i = 0;
try  
{
    Vector4 vOut = new Vector4();
    final double m0v0 = matrix[0].v_[0];
    final double m0v1 = matrix[0].v_[1];
    final double m0v2 = matrix[0].v_[2];
    final double m0v3 = matrix[0].v_[3];
    final double m1v0 = matrix[1].v_[0];
    final double m1v1 = matrix[1].v_[1];
    final double m1v2 = matrix[1].v_[2];
    final double m1v3 = matrix[1].v_[3];
    final double m2v0 = matrix[2].v_[0];
    final double m2v1 = matrix[2].v_[1];
    final double m2v2 = matrix[2].v_[2];
    final double m2v3 = matrix[2].v_[3];
    final double m3v0 = matrix[3].v_[0];
    final double m3v1 = matrix[3].v_[1];
    final double m3v2 = matrix[3].v_[2];
    final double m3v3 = matrix[3].v_[3];

    while (true)
    {
        Vector4 v = vertices[i];
        i++;

        double vertexVal = v.v_[0];
        vOut.v_[0] = vertexVal * m0v0;
        vOut.v_[1] = vertexVal * m0v1;
        vOut.v_[2] = vertexVal * m0v2;
        vOut.v_[3] = vertexVal * m0v3;

        vertexVal = v.v_[1];
        vOut.v_[0] += vertexVal * m1v0;
        vOut.v_[1] += vertexVal * m1v1;
        vOut.v_[2] += vertexVal * m1v2;
        vOut.v_[3] += vertexVal * m1v3;

        vertexVal = v.v_[2];
        vOut.v_[0] += vertexVal * m2v0;
        vOut.v_[1] += vertexVal * m2v1;
        vOut.v_[2] += vertexVal * m2v2;
        vOut.v_[3] += vertexVal * m2v3;

        vertexVal = v.v_[3];
        vOut.v_[0] += vertexVal * m3v0;
        vOut.v_[1] += vertexVal * m3v1;
        vOut.v_[2] += vertexVal * m3v2;
        vOut.v_[3] += vertexVal * m3v3; 

        vertices[i] = vOut;

    } 
}
catch (ArrayIndexOutOfBoundsException aioobe) 
{
    // loop is done
}

JAVA矩阵向量乘法比C版本慢100倍

问题描述

3 个解决方案

解决方案1
5 已采纳 2012-10-21 13:34:29

解决方案2
0 2012-10-21 13:32:11

解决方案3
0 2012-10-25 20:02:48

JAVA矩阵向量乘法比C版本慢100倍

问题描述

3 个解决方案

解决方案1 5 已采纳 2012-10-21 13:34:29

解决方案2 0 2012-10-21 13:32:11

解决方案3 0 2012-10-25 20:02:48

解决方案1
5 已采纳 2012-10-21 13:34:29

解决方案2
0 2012-10-21 13:32:11

解决方案3
0 2012-10-25 20:02:48