简体   繁体   English

ARM Assembly:如何在ARM Assembly函数内部传递和使用指针数组

[英]ARM Assembly: How to pass and make use of a array of pointers inside an ARM Assembly function

I have a C function in which I have 4 pointers and each of them point to different locations of a large 2D array of floats. 我有一个C函数,其中有4个指针,每个指针都指向大型2D浮点数组的不同位置。

Because the ARM assembly functions can only be passed with 4 parameters (r0 - r3), I'm not able to understand how to pass the pointer to my return value, which will become the 5th parameter to my assembly function. 由于只能使用4个参数(r0-r3)传递ARM汇编函数,因此我无法理解如何将指针传递给返回值,该返回值将成为汇编函数的第5个参数。

So, to overcome this, I thought of putting all the 4 pointers into an array of pointers, so that I will have 3 more free spots, using which I can pass a pointer to my return value as well. 因此,为了克服这个问题,我想到将所有4个指针放入一个指针数组中,这样我将有3个以上的空闲点,使用这些空闲点,我还可以将一个指针传递给我的返回值。

But, I don't know how I can extract the four individual pointers from my array of pointers, inside the assembly function. 但是,我不知道如何从汇编函数内部的指针数组中提取四个单独的指针。 I'm failing in my attempts. 我的尝试失败了。

Here is a sample of what I'm trying to do. 这是我正在尝试做的一个示例。

Program 程序

#include<stdio.h>

void  _my_arm_asm(float32_t *);

float32_t data_array[100][100];

void main()
{
       float32_t *ptr1, *ptr2, *ptr3, *ptr4;

        ptr1 = \\ data_array[value] + (some value);
        ptr2 = \\ data_array[value] + (some other value);
        ptr3 = \\ data_array[value] + (some other value);
        ptr4 = \\ data_array[value] + (some other value);

       float32_t *array_pointers[4];
       array_pointers[0] = ptr1;
       array_pointers[1] = ptr2;
       array_pointers[2] = ptr3;
       array_pointers[3] = ptr4;

       float32x4_t result;

       _my_arm_asm(array_pointers, &result);

        ....
        ....
        ....
       return 0;


}



.text
    .global _my_arm_asm

_my_arm_asm:
            #r0: Pointer to my array of pointers
            #r1: Pointer to my result

        push   {r4-r11, lr}

        # How to access the array of pointers?

        # I previously tried this, is this the right way to do it?

        # mov r4, #0
        # vld4.32 {d0, d1, d2, d3}, [r0, r4]
        # add r4, r4, #1
        # vld4.32 {d4, d5, d6, d7}, [r0, r4] 
        # add r4, r4, #1
        # vld4.32 {d8, d9, d10, d11}, [r0, r4] 
        # add r4, r4, #1
        # vld4.32 {d12, d13, d14, d15}, [r0, r4] 


        ....
        ....
        ....

        pop    {r4-r11, pc}

In general, if more than 4 arguments are passed to a function the excess arguments are passed on the stack. 通常,如果将一个以上的参数传递给一个函数,则多余的参数将传递给堆栈。

The ARM EABI specifies how compilers should pass arguments to functions (it also specifies which registers a caller can expect to be unchanged across the function call). ARM EABI指定了编译器应如何将参数传递给函数(还指定了调用者在整个函数调用中可以期望保持不变的寄存器)。 Your assembly routine can use the same techniques (and probably should unless you have a good reason not to). 您的汇编例程可以使用相同的技术(除非您有充分的理由不这样做,否则可能应该使用)。 If nothing else, that'll mean that your assembly function can be easily called from C. 如果没有其他说明,那意味着可以从C轻松调用您的汇编函数。

Chapter 5 (The Base procedure Call Standard) of the "Procedure Call Standard for the ARM Architecture" should have the exact details. “ ARM体系结构的过程调用标准”的第5章(基本过程调用标准)应具有确切的细节。 It's pretty complex on the face of it (becuase there's a lot of detail on alignment, argument size, etc), but I think for your purposes it boils down to that the 5th argument to the function get's pushed onto the stack. 表面上它非常复杂(因为在对齐,参数大小等方面有很多细节),但我认为出于您的目的,可以归结为将函数的第5个参数压入堆栈。

Of course, as you suggest in your question, you could avoid all that by packing your 4 pointers into a structure and passing a pointer to the struct - in your assembly routine you simple load that struct pointer into a register and use that to in turn load the pointers you really need. 当然,正如您在问题中所建议的那样,可以通过将4个指针打包到结构中并传递一个指向该结构的指针来避免所有这些情况-在汇编例程中,您只需将该结构指针加载到寄存器中并依次使用该指针即可加载您真正需要的指针。

I think that the ARM assembly might look something like: 我认为ARM程序集可能看起来像:

                 // r0 has the 1st parameter
ldr r4, [r0]     // get array_pointers[0] into r4
// ...

ldr r5, [r0, #4] // get array_pointers[1] into r5
// ...

ldr r6, [r0, #8] // get array_pointers[2] into r6

You could also use a 'load multiple' instruction to get all 4 pointers in one shot, but I'm not sure what you register usage requirements/restrictions might be. 您也可以使用“加载多个”指令一次获得所有4个指针,但是我不确定您注册的使用要求/限制可能是什么。

The fifth and further parameters (assuming int-sized parameters) are passed on stack. 第五和其他参数(假定为int大小的参数)在堆栈上传递。 Ie the fifth parameter will be accessible as [SP], the sixth as [SP,#4] and so on. 也就是说,第五个参数可以通过[SP]访问,第六个参数可以通过[SP,#4]访问,依此类推。 Read the Procedure Call Standard for the ARM Architecture for the detailed explanations. 阅读ARM体系结构过程调用标准以获取详细说明。
That said, you don't have to use assembly to make use of NEON. 就是说,您不必使用汇编来使用NEON。 Check out NEON intrinsics which allow you to do all operations using plain C code. 查看NEON内部函数 ,它使您可以使用纯C代码执行所有操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM