I'm in need of macros with many registers involved, like :
.macro load128bytes
vld1.8 {d0, d1, d2, d3}, [r0]!
vld1.8 {d4, d5, d6, d7}, [r0]!
vld1.8 {d8, d9, d10, d11}, [r0]!
vld1.8 {d12, d13, d14, d15}, [r0]!
.endm
As you can see, the registers are consecutive. However, I want to pass the number of starting register as an argument like :
.macro load128bytes srn
vld1.8 {d\srn, d\srn+1, d\srn+2, d\srn+3}, [r0]!
vld1.8 {d\srn+4, d\srn+5, d\srn+6, d\srn+7}, [r0]!
vld1.8 {d\srn+8, d\srn+9, d\srn+10, d\srn+11}, [r0]!
vld1.8 {d\srn+12, d\srn+13, d\srn+14, d\srn+15}, [r0]!
.endm
And of course, the above doesn't work. The assembler interprets them as : d0, d0+1, d0+2, d0+3 instead of d0, d1, d2, d3 what I intend them to be when srn is 0.
I searched the web and found an example that might solve this problem :
.macro sum from=0, to=5
.long \from
.if \to-\from
sum "(\from+1)",\to
.endif
.endm
Although the example above works fine, it didn't help solving my problems :
.macro test srn0
vld1.8 {d"(\srn0)", d"(\srn0+1)"}, [r0]
.endm
Building above results in an error message : Neon double or quad precision register expected -- `vld1.8 {d"0",d"0+1"},[r0]'
Any ideas? It's really frustrating to pass up to sixteen registers each time, and even worse, it makes my code prone to errors.
Thanks in advance.
The sum example works becuase it gives this output:
.long 0
.long (0+1)
.long ((0+1)+1)
.long (((0+1)+1)+1)
.long ((((0+1)+1)+1)+1)
.long (((((0+1)+1)+1)+1)+1)
Thus after macro expansion the assembler sees expressions where it expects expressions, and evaluates them. The trick you need is to get it to evaluate expressions where it doesn't expect expressions, and to do so before the assembly pass. To the rescue comes alternate macro mode and CPP-style macro chaining (smaller example for clarity):
.macro _load32bytes base r0 r1 r2 r3
vld1.8 {d\r0, d\r1, d\r2, d\r3}, [\base]!
.endm
.macro load32bytes srn
.altmacro
_load32bytes r0, %(\srn), %(\srn+1), %(\srn+2), %(\srn+3)
.endm
The %
operator allows evaluating an expression to a string arbitrarily during macro expansion - I couldn't seem to convince it to work within a single macro, but chaining does the job:
$ arm-linux-gnueabihf-as -alm -mfpu=neon test.s
ARM GAS test.s page 1
1 .macro _load32bytes base r0 r1 r2 r3
2 vld1.8 {d\r0, d\r1, d\r2, d\r3}, [\base]!
3 .endm
4
5 .macro load32bytes srn
6 .altmacro
7 _load32bytes r0, %(\srn), %(\srn+1), %(\srn+2), %(\srn+3)
8 .endm
9
10 load32bytes 0
10 > .altmacro
10 > _load32bytes r0,%(0),%(0+1),%(0+2),%(0+3)
10 0000 0D0220F4 >> vld1.8 {d0,d1,d2,d3},[r0]!
11 load32bytes 3
11 > .altmacro
11 > _load32bytes r0,%(3),%(3+1),%(3+2),%(3+3)
11 0004 0D3220F4 >> vld1.8 {d3,d4,d5,d6},[r0]!
12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.