简体   繁体   English

Arm Mali T-624 执行时间为 12666 毫秒

[英]Arm Mali T-624 STUCK EXECUTION TIME IN 12666 ms

I am using this gpu for my thesis in the university我在大学的论文中使用这个 gpu

I am running a lot of different kernels on this thing and the execution time is stuck in 12666.6689ms even though i had a loop with 88 instruction * 100m iterations.我在这个东西上运行了很多不同的内核,并且执行时间停留在 12666.6689 毫秒,即使我有一个 88 指令 * 100m 迭代的循环。

__kernel void scalar_mult_add(__global  int * list)
{
    unsigned int x=38;
    unsigned int y=38;
    for(int i=0; i<1000000  ; i++){
        y=x*y;
        x=x+y;
    }
}

The only thing that can make the execution time get increase is adding x!=0 inside the for loop statements唯一可以增加执行时间的方法是在 for 循环语句中添加x!=0

__kernel void scalar_mult_add(__global  int * list)
{
    unsigned int x=38;
    unsigned int y=38;
    for(int i=0; i<1000000  && ***x!=0*** ; i++){
        y=x*y;
        x=x+y;
    }
}

Why does this thing happens all the time.为什么这种事情总是发生。 I can't understand.我无法理解。 why????为什么???? eg 88million instructions have the same execution time with 1 million instructions even though i dont have that much units to execute such big kernel at the same time like 1 million instructions.例如,8800 万条指令与 100 万条指令具有相同的执行时间,即使我没有那么多单元来像 100 万条指令一样同时执行如此大的 kernel。

Why does adding a single x?=0 statement in the loop makes the execution-time increased that much and couple of additions inside the for loop do not?为什么在循环中添加单个 x?=0 语句会使执行时间增加那么多,而在 for 循环中添加几个语句却没有?

Why does adding a single x?=0 statement in the loop make the execution-time increase?为什么在循环中添加单个 x?=0 语句会使执行时间增加?

In the original case the loop doesn't do anything - the output isn't kept and the loop result is not used in any further computation.在原始情况下,循环不执行任何操作 - 不保留 output 并且循环结果不用于任何进一步的计算。 As is noted in the comments above, the compiler is just optimizing out the loop.正如上面评论中所指出的,编译器只是优化了循环。 The Mali compiler doesn't have an option to disable optimization, obvious dead-code is always removed. Mali 编译器没有禁用优化的选项,明显的死代码总是被删除。

Adding the x != 0 to the loop condition check means that the loop result is "used" - you need the previous iteration loop result to determine if you keep iterating.x != 0添加到循环条件检查意味着循环结果被“使用” - 您需要先前的迭代循环结果来确定您是否继续迭代。 The code is still pointless (no output) but the compiler doesn't see this as dead-code, so it stays in.代码仍然毫无意义(没有输出),但编译器不认为这是死代码,所以它留在里面。

However, note that Mali-T624 has SIMD vector unit.但是,请注意 Mali-T624 具有 SIMD 向量单元。 Writing a dependent scalar loop like this means that you are effectively killing any auto-vectorization in the compiler.像这样编写一个依赖标量循环意味着您实际上是在扼杀编译器中的任何自动向量化。 Highly recommend using vec4 data types for the computation.强烈建议使用 vec4 数据类型进行计算。

If you want some basic static analysis you might want to look at the Mali Offline Compiler, which is freely downloadable in Arm Mobile Studio.如果您想要一些基本的 static 分析,您可能需要查看 Mali Offline Compiler,它可以在 Arm Mobile Studio 中免费下载。 Note that compiling OpenCL kernels requires macOS or Linux, but if you are on Windows you can run the Linux binary under WSL.请注意,编译 OpenCL 内核需要 macOS 或 Linux,但如果您使用的是 Windows,您可以在 WSL 下运行 Linux 二进制文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM