简体   繁体   English

JAVA矩阵向量乘法比C版本慢100倍

[英]JAVA Matrix-Vector-Multiplication 100 times slower than C-Version

im working on performance-differences between Android JAVA- and Android NDK-applications. 我正在研究Android JAVA和Android NDK应用程序之间的性能差异。 I performed a Matrix4D-Vector4D Transformation on more than 90000 vertices as an example for 3D Graphics. 作为3D图形的示例,我在超过90000个顶点上执行了Matrix4D-Vector4D转换。

It seemes, that the JAVA Version is nearly 100 times slower than the C-Version. 看来,JAVA版本比C版本了近100倍 Did i something wrong? 我有什么问题吗? Does anyone have similar experiences? 有没有类似的经历?

my Java-Code for transformation: 我的Java代码进行转换:

        long t1 = System.nanoTime();
        for ( int i = 0; i < vCount; i++)
        {

            Vector4 vOut = new Vector4();
            Vector4 v = vertices[i];

            vOut.v_[0] = v.v_[0] * matrix[0].v_[0];
            vOut.v_[1] = v.v_[0] * matrix[0].v_[1];
            vOut.v_[2] = v.v_[0] * matrix[0].v_[2];
            vOut.v_[3] = v.v_[0] * matrix[0].v_[3];

            vOut.v_[0] += v.v_[1] * matrix[1].v_[0];
            vOut.v_[1] += v.v_[1] * matrix[1].v_[1];
            vOut.v_[2] += v.v_[1] * matrix[1].v_[2];
            vOut.v_[3] += v.v_[1] * matrix[1].v_[3];

            vOut.v_[0] += v.v_[2] * matrix[2].v_[0];
            vOut.v_[1] += v.v_[2] * matrix[2].v_[1];
            vOut.v_[2] += v.v_[2] * matrix[2].v_[2];
            vOut.v_[3] += v.v_[2] * matrix[2].v_[3];

            vOut.v_[0] += v.v_[3] * matrix[3].v_[0];
            vOut.v_[1] += v.v_[3] * matrix[3].v_[1];
            vOut.v_[2] += v.v_[3] * matrix[3].v_[2];
            vOut.v_[3] += v.v_[3] * matrix[3].v_[3]; 

            vertices[i] = vOut;

        }
        long t2 = System.nanoTime();        
        long diff = t2 - t1;        
        double ms = (double)(diff / 1000000.0f);
        Log.w("GL2JNIView", String.format("ms %.2f ", ms));

Performance (Transform > 90 000 Vertices | Android 4.0.4 SGS II): (Median-value of 200 runs) 性能(转化> 90 000顶点| Android 4.0.4 SGS II):(平均200次运行)

JAVA-Version:   2 FPS
C-Version:    190 FPS

You create a new Vector4 in each Iteration. 您在每个迭代中创建一个新的Vector4。 From my own experience using new inside loops can cause unexpected performance problems in Android. 根据我自己的经验,使用新的内部循环会在Android中导致意外的性能问题。

AFAIK, Android Java implementation is thru a virtual machine called Dalvik which has a different instruction set than the JVM and does not use any just-in-time compilation techniques to dynamically translate some bytecodes to machine code, but just interpret them. AFAIK,Android Java实现是通过一个名为Dalvik的虚拟机实现的,该虚拟机具有与JVM不同的指令集,并且不使用任何即时编译技术将某些字节码动态转换为机器代码,而只是对其进行解释。 So Dalvik is obviously slower on CPU bound tasks rthan C. 因此,Dalvik在CPU绑定任务上显然比C慢。

This might change in very recent Android systems. 在最近的Android系统中,这可能会改变。

You also should change your loop. 您还应该更改循环。 In addition to the answer by @toopok4k3 you should try these things: 除了@ toopok4k3的答案,您还应该尝试以下操作:

  • Dump the for loop and just catch an ArrayIndexOutOfBounds exception. 转储for循环并仅捕获ArrayIndexOutOfBounds异常。 You have a large enough loop to make up for the overhead of the try/catch. 您有足够大的循环来弥补try / catch的开销。
  • If the matrix array and the values they contain aren't changing from one loop iteration to the next then assign them to constants outside of the loop. 如果矩阵数组及其包含的值从一个循环迭代到下一个循环没有变化,则将它们分配给循环外的常量。 De-referencing arrays and accessing member variables aren't nearly as fast as local variables. 取消引用数组和访问成员变量的速度不如局部变量快。
  • Since v.v_[] is used several times, assign it to a local variable and use it 4 times before getting the next one. 由于v.v_ []已使用多次,因此将其分配给局部变量并使用4次,然后再获取下一个变量。

I am assuming the values are doubles in the version below. 我假设值在以下版本中为双精度。

int i = 0;
try  
{
    Vector4 vOut = new Vector4();
    final double m0v0 = matrix[0].v_[0];
    final double m0v1 = matrix[0].v_[1];
    final double m0v2 = matrix[0].v_[2];
    final double m0v3 = matrix[0].v_[3];
    final double m1v0 = matrix[1].v_[0];
    final double m1v1 = matrix[1].v_[1];
    final double m1v2 = matrix[1].v_[2];
    final double m1v3 = matrix[1].v_[3];
    final double m2v0 = matrix[2].v_[0];
    final double m2v1 = matrix[2].v_[1];
    final double m2v2 = matrix[2].v_[2];
    final double m2v3 = matrix[2].v_[3];
    final double m3v0 = matrix[3].v_[0];
    final double m3v1 = matrix[3].v_[1];
    final double m3v2 = matrix[3].v_[2];
    final double m3v3 = matrix[3].v_[3];

    while (true)
    {
        Vector4 v = vertices[i];
        i++;

        double vertexVal = v.v_[0];
        vOut.v_[0] = vertexVal * m0v0;
        vOut.v_[1] = vertexVal * m0v1;
        vOut.v_[2] = vertexVal * m0v2;
        vOut.v_[3] = vertexVal * m0v3;

        vertexVal = v.v_[1];
        vOut.v_[0] += vertexVal * m1v0;
        vOut.v_[1] += vertexVal * m1v1;
        vOut.v_[2] += vertexVal * m1v2;
        vOut.v_[3] += vertexVal * m1v3;

        vertexVal = v.v_[2];
        vOut.v_[0] += vertexVal * m2v0;
        vOut.v_[1] += vertexVal * m2v1;
        vOut.v_[2] += vertexVal * m2v2;
        vOut.v_[3] += vertexVal * m2v3;

        vertexVal = v.v_[3];
        vOut.v_[0] += vertexVal * m3v0;
        vOut.v_[1] += vertexVal * m3v1;
        vOut.v_[2] += vertexVal * m3v2;
        vOut.v_[3] += vertexVal * m3v3; 

        vertices[i] = vOut;

    } 
}
catch (ArrayIndexOutOfBoundsException aioobe) 
{
    // loop is done
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM