简体   繁体   English

llvm-IR 变量和汇编器符号的 llvm 命名策略

[英]llvm naming policy for llvm-IR variables and assembler symbols

As I see llvm supports null-terminated strings including any character (0x01 to 0xff) as valid names for llvm-IR variables and assembler symbols.正如我所见,llvm 支持以空字符结尾的字符串,包括任何字符(0x01 到 0xff)作为 llvm-IR 变量和汇编器符号的有效名称。 In my opinion such desicion may cause some problems.在我看来,这样的决定可能会导致一些问题。

  1. It is difficult to edit programms in llvm-IR and assembler using text editors (Vim, Kate and others), when names contains "special" (nonprintable) charecters当名称包含“特殊”(不可打印)字符时,很难使用文本编辑器(Vim、Kate 等)在 llvm-IR 和汇编程序中编辑程序
  2. LLvm and assemblers support escaping with double quotes, for example "AB" is a name with a space character. LLvm 和汇编器支持使用双引号转义,例如"AB"是一个带有空格字符的名称。 It would be logically to expect a printf like style in coding of special characters.从逻辑上讲,期望在特殊字符编码中使用类似 printf 的风格是合乎逻辑的。 I mean "\\n" , "\\t" , "\\xAB" , but llvm-IR and assembler doesn't supports this style for names (but llvm suports \\KL for initializers).我的意思是"\\n""\\t""\\xAB" ,但是 llvm-IR 和汇编程序不支持这种名称样式(但 llvm 支持 \\KL 用于初始值设定项)。
  • in one hand "A\\n" produces not "A" and newline character but name with all 3 bytes in object elf-file一方面, "A\\n"产生的不是"A"newline character而是包含对象精灵文件中所有 3 个字节的名称
  • in other hand "A\\n" and "A\\\\n" produces identical names for llvm另一方面, "A\\n""A\\\\n"为 llvm 生成相同的名称

(So it seems that even llvm doesn't support special naming in a any kind of proper way.) (因此,即使 llvm 似乎也不以任何适当的方式支持特殊命名。)

@"A\n"   = internal constant i32 1
@"A\\n"  = internal constant i32 2
$ clang-9 test.ll -S
test.ll:3:1: error: redefinition of global '@A\n'
@"A\\n"  = internal constant i32 2
  1. What is about @GOTOFF , or @plt ? @GOTOFF@plt什么? How to differentiate names including @GOTOFF from assembler relocation specification?如何区分包括@GOTOFF在内的@GOTOFF与汇编@GOTOFF定位规范? Why "AB@GOTOFF" can be assembling, but "AB"@GOTOFF doesn't works?为什么"AB@GOTOFF"可以拼装,但"AB"@GOTOFF不起作用?

  2. Bug https://sourceware.org/bugzilla/show_bug.cgi?id=18581 was opened in 2015 but even now gas doesn't support some characters in names, which llvm supports. Bug https://sourceware.org/bugzilla/show_bug.cgi?id=18581于 2015 年开放,但即使现在 gas 也不支持名称中的某些字符,而 llvm 支持。 For example "A,B" and "A\\B" can't be assembling by gas.例如"A,B""A\\B"不能通过气体组装。 So llvm creates assembler dialect, which can't be assembling by gas.所以llvm创建了汇编方言,不能通过gas进行汇编。

Programming languages (C/C++, Rust, Go, Python, Java, ...) support only letters , digits , '_' , '$' characters in identifiers.编程语言(C/C++、Rust、Go、Python、Java 等)在标识符中仅支持lettersdigits'_''$'字符。 Frontends using also '.'前端也使用'.' , '$' , '#' characters, but in any way they generate names valid in assembler (without any escaping by double quotes). , '$' , '#'字符,但它们以任何方式生成在汇编程序中有效的名称(没有任何双引号转义)。

Probably, only llvm optimizations generate names with special characters.可能只有 llvm 优化会生成带有特殊字符的名称。 But these names are created only for globals with internal (static in C terms) linkage.但是这些名称仅为具有内部(C 术语中的静态)链接的全局变量创建。 So why not to use special pattern like "__llvm_internal_global_Id_*" for such globals (some names are reserved in all cases)?那么为什么不为这样的全局变量使用像"__llvm_internal_global_Id_*"这样的特殊模式(某些名称在所有情况下都是保留的)?

So what are the reasons to use such naming policy?那么使用这种命名策略的原因是什么? May be it is better to use a small but simple set of valid characters for naming?使用一组小而简单的有效字符进行命名是否更好?

I will try to summarize the interim results.我将尝试总结中期结果。

The llvm supports llvm-IR variable names and asm symbol names as a sequence of any characters. llvm 支持 llvm-IR 变量名称和 asm 符号名称作为任何字符的序列。 In general it looks as a good solution.总的来说,它看起来是一个很好的解决方案。

But current realizations has some special moments.但是当前的实现有一些特殊的时刻。

  1. Llvm-parser can works with llvm-IR, in which both string initializers and names of the global variables contain escape sequences (using "\\AB"-pattern where 0xAB is hex-code). Llvm-parser 可以与 llvm-IR 一起使用,其中字符串初始值设定项和全局变量的名称都包含转义序列(使用“\\AB”模式,其中 0xAB 是十六进制代码)。 But in assembler language the escape sequence doesn't used or/and doesn't worked (also there are readelf, objdump, gdb and so on).但是在汇编语言中,转义序列不使用或/和不起作用(还有 readelf、objdump、gdb 等)。 This fact creates problems for the use of text editors.这一事实给文本编辑器的使用带来了问题。

  2. Assembler language uses special relocation modificators likes @plt , @GOTOFF and others after symbol names.汇编语言在符号名称后使用特殊的重定位@GOTOFF ,如@plt@GOTOFF等。 So now it is a collision when a symbol name (in double quotes) includes a substring likes "@plt" .因此,现在当符号名称(双引号中)包含像"@plt"这样的子字符串时,就会发生冲突。 And I propose a simple rule for assembler lexic-parser我为汇编程序 lexic-parser 提出了一个简单的规则

A@plt       - symbol with name 'A' and plt-relocation
"A@plt"     - symbol with name 'A@plt'
"A@plt"@plt - symbol with name 'A@plt' and plt-relocation

(so all in double quotes is a part of name, and all after double quotes or just in the end of a symbol name is a relocation modificator). (所以双引号中的所有内容都是名称的一部分,双引号之后或符号名称末尾的所有内容都是重定位修饰符)。

  1. Gas declares the supporting ( https://sourceware.org/binutils/docs/as/Symbol-Intro.html#Symbol-Intro ), but in fact doesn't support "," or "\\" in symbol names. Gas 声明支持( https://sourceware.org/binutils/docs/as/Symbol-Intro.html#Symbol-Intro ),但实际上不支持符号名称中的",""\\" So the set of valid symbol names in gas is less then in llvm-as.因此,gas 中的有效符号名称集少于 llvm-as。

And I want to hope that these moments will be fixed in llvm and gas (if this is a correct description of the current situation).我希望这些时刻能在 llvm 和 gas 中得到修复(如果这是对当前情况的正确描述)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM