GCC for Risc-V produces nop
instructions after call
instructions by default:
$ cat test.c
void g();
void f() {
g();
}
$ riscv64-unknown-elf-gcc -S test.c -o -
[...]
f:
addi sp,sp,-16
sd ra,8(sp)
sd s0,0(sp)
addi s0,sp,16
call g
nop #### <-----------here
ld ra,8(sp)
ld s0,0(sp)
addi sp,sp,16
jr ra
.size f, .-f
.ident "GCC: (GNU) 8.3.0"
I'd expect that when targeting an architecture that has branch delay slot , but my understanding is that Risc-V is not such an architecture. Actually, the nop
disappears when compiling with -O1
or higher.
Is it just a "bug" in GCC that emits nop
as a leftover from architectures that have delay slots, or is there an actual reason for this nop
instruction?
Not a complete answer, but at least an attempt at diving into the reason why the nop appears. I strongly believe it's a bug/leftover from architectures that have delay slots (as it gets added in the very first RTL pass - expand).
Onto the investigation, GCC has 2 types of passes, Tree and RTL. To see them in action, prep 2 folders as there will be many files, noopt
and opt
and use the -fdump-tree-all-raw -fdump-rtl-all
to see the intermediate results. The last stage of the Tree pass gives (noopt case):
$ cat noopt/test.c.232t.optimized
;; Function f (f, funcdef_no=0, decl_uid=1549, cgraph_uid=0, symbol_order=0)
f ()
{
<bb 2> :
gimple_call <g, NULL>
gimple_return <NULL NULL>
}
The opt case ( -O1
) differs negligibly:
$ diff -u noopt/test.c.232t.optimized opt/test.c.232t.optimized
--- noopt/test.c.232t.optimized 2019-09-03 14:48:02.874071927 +0200
+++ opt/test.c.232t.optimized 2019-09-03 14:48:29.550278667 +0200
@@ -3,7 +3,7 @@
f ()
{
- <bb 2> :
+ <bb 2> [local count: 1073741825]:
gimple_call <g, NULL>
gimple_return <NULL NULL>
The first stage of the RTL passes (expand) is the one that differs:
$ cat noopt/test.c.234r.expand
;; Function f (f, funcdef_no=0, decl_uid=1549, cgraph_uid=0, symbol_order=0)
;; Generating RTL for gimple basic block 2
try_optimize_cfg iteration 1
Merging block 3 into block 2...
Merged blocks 2 and 3.
Merged 2 and 3 without moving.
Merging block 4 into block 2...
Merged blocks 2 and 4.
Merged 2 and 4 without moving.
try_optimize_cfg iteration 2
;;
;; Full RTL generated for this function:
;;
(note 1 0 3 NOTE_INSN_DELETED)
(note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(call_insn 5 2 8 2 (parallel [
(call (mem:SI (symbol_ref:DI ("g") [flags 0x41] <function_decl 0x7fbc2827a400 g>) [0 g S4 A32])
(const_int 0 [0]))
(clobber (reg:SI 1 ra))
]) "../test.c":3 -1
(nil)
(nil))
(insn 8 5 0 2 (const_int 0 [0]) "../test.c":4 -1
(nil))
The difference with -O1
is just removing that const_int 0 [0]
, which will ultimately lead to the nop
:
$ diff -u noopt/test.c.234r.expand opt/test.c.234r.expand
--- noopt/test.c.234r.expand 2019-09-03 14:48:02.874071927 +0200
+++ opt/test.c.234r.expand 2019-09-03 14:48:29.550278667 +0200
@@ -25,12 +25,10 @@
(note 1 0 3 NOTE_INSN_DELETED)
(note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
-(call_insn 5 2 8 2 (parallel [
- (call (mem:SI (symbol_ref:DI ("g") [flags 0x41] <function_decl 0x7fbc2827a400 g>) [0 g S4 A32])
+(call_insn 5 2 0 2 (parallel [
+ (call (mem:SI (symbol_ref:DI ("g") [flags 0x41] <function_decl 0x7f0bdec1f400 g>) [0 g S4 A32])
(const_int 0 [0]))
(clobber (reg:SI 1 ra))
]) "../test.c":3 -1
(nil)
(nil))
-(insn 8 5 0 2 (const_int 0 [0]) "../test.c":4 -1
- (nil))
GCC will probably do this on any architecture. I think the nop
instruction is related to the void
result, it's where a non-void function would set up the return value. Try compiling this:
int g();
int f() {
g();
return 1;
}
For a void function, there is nothing to do to generate the result, hence the nop
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.