简体   繁体   English

如何使用clang -emit-llvm编译并保留“未使用的”C声明

[英]How to compile and keep “unused” C declarations with clang -emit-llvm

Context 上下文

I'm writing a compiler for a language that requires lots of runtime functions. 我正在为需要大量运行时函数的语言编写编译器。 I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2 . 我正在使用LLVM作为我的后端,因此codegen需要所有运行时类型(函数,结构等)的类型,而不是使用LLVM API手动定义所有这些类型或手写LLVM IR我想编写C中的头文件并编译为编译器可以使用LLVMParseBitcodeInContext2

Issue 问题

The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions . 我遇到的问题是clang似乎没有保留任何任何函数定义都没有使用的类型声明 Clang has -femit-all-decls which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations. Clang有-femit-all-decls 听起来像是应该解决它,但遗憾的是不是,谷歌搜索表明它的名字错误,因为它只影响未使用的定义,而不是声明。

I then thought perhaps if I compile the headers only into .gch files I could pull them in with LLVMParseBitcodeInContext2 the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature so something must be different. Perhaps the difference is small enough to workaround? 然后我想也许如果我只将头文件编译成.gch文件我可以用LLVMParseBitcodeInContext2以相同的方式将它们拉入(因为文档说它们使用“相同的”bitcode格式“,但是这样做错误有error: Invalid bitcode signature所以必须有所不同。也许差异小到可以解决?

Any suggestions or relatively easy workarounds that can be automated for a complex runtime? 任何可以为复杂运行时自动化的建议或相对简单的解决方法? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. 如果有人对接近这个一般用例有一个完全替代的建议,我也会感兴趣,记住我不想在运行时函数体中静态链接我生成的每个单个目标文件,只是类型。 I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong. 我想这也是其他编译器所需要的,所以如果我接近这个错误,我不会感到惊讶。


eg given this input: 例如,给出这个输入:

runtime.h runtime.h

struct Foo {
  int a;
  int b;
};

struct Foo * something_with_foo(struct Foo *foo);

I need a bitcode file with this equivalent IR 我需要一个具有此等效IR的bitcode文件

runtime.ll runtime.ll

; ...etc...

%struct.Foo = type { i32, i32 }

declare %struct.Foo* @something_with_foo(%struct.Foo*)

; ...etc...

I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. 我可以手动编写它,但这将是重复的,因为我还需要为其他互操作创建C头,并且理想的是不必手动保持它们同步。 The runtime is rather large. 运行时相当大。 I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers. 我想我也可以做相反的事情:在LLVM IR中编写声明并生成C头。


Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode 有人问过这些年,但是对于这种大小和类型复杂性的运行时,提出的解决方案相当hacky并且相当不切实际: Clang - 将C头编译为LLVM IR / bitcode

Clang's precompiled headers implementation does not seem to output LLVM IR, but only the AST (Abstract Syntax Tree) so that the header does not need to be parsed again: Clang的预编译头实现似乎不输出LLVM IR,而只输出AST(抽象语法树),因此不需要再次解析头:

The AST file itself contains a serialized representation of Clang's abstract syntax trees and supporting data structures, stored using the same compressed bitstream as LLVM's bitcode file format. AST文件本身包含Clang的抽象语法树和支持数据结构的序列化表示,使用与LLVM的bitcode文件格式相同的压缩比特流存储。

The underlying binary format may be the same, but it sounds like the content is different and LLVM's bitcode format is merely a container in this case. 底层二进制格式可能相同,但听起来内容不同,LLVM的bitcode格式在这种情况下仅仅是一个容器。 This is not very clear from the help page on the website, so I am just speculating. 这在网站的帮助页面上并不是很清楚,所以我只是在猜测。 A LLVM/Clang expert could help clarify this point. LLVM / Clang专家可以帮助澄清这一点。

Unfortunately, there does not seem to be an elegant way around this. 不幸的是,似乎并没有一种优雅的方式。 What I suggest in order to minimize the effort required to achieve what you want is to build a minimal C/C++ source file that in some way uses all the declarations that you want to be compiled to LLVM IR. 我建议最小化实现所需的工作量是建立一个最小的C / C ++源文件,它以某种方式使用您要编译为LLVM IR的所有声明。 For example, you just need to declare a pointer to a struct to ensure it does not get optimized away, and you may just provide an empty definition for a function to keep its signature. 例如,您只需要声明一个指向结构的指针,以确保它不会被优化掉,并且您可能只为函数提供一个空的定义以保持其签名。

Once you have a minimal source file, compile it with clang -O0 -c -emit-llvm -o precompiled.ll to get a module with all definitions in LLVM IR format. 获得最小的源文件后,使用clang -O0 -c -emit-llvm -o precompiled.ll进行clang -O0 -c -emit-llvm -o precompiled.ll以获得具有LLVM IR格式的所有定义的模块。

An example from the snippet you posted: 您发布的代码段中的示例:

struct Foo {
  int a;
  int b;
};

// Fake function definition.
struct Foo *  something_with_foo(struct Foo *foo)
{
    return NULL;
}

// A global variable.
struct Foo* x;

Output that shows that definitions are kept: https://godbolt.org/g/2F89BH 显示定义的输出: https//godbolt.org/g/2F89BH

So, clang doesn't actually filter out the unused declarations. 因此, clang实际上并没有过滤掉未使用的声明。 It defers emitting forward declarations till their first use. 它推迟发出前向声明,直到第一次使用。 Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration. 每当使用一个函数时,它会检查它是否已经被发出,如果没有,它会发出函数声明。

You can look at these lines in the clang repo . 您可以在clang仓库中查看这些行

// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
  if (!FD->doesDeclarationForceExternallyVisibleDefinition())
    return;

The simple fix here would be to either comment the last two lines or just add && false to the second condition. 这里的简单修复方法是对最后两行进行注释,或者只是将&& false添加到第二个条件。

// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
  if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
    return;

This will cause clang to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll (or .bc ) files. 这将导致clang尽快它看到它发出的声明,这也可能会改变其定义出现在您的订单.ll (或.bc )文件。 Assuming that is not an issue. 假设这不是问题。

To make it cleaner you can also add a command line flag --emit-all-declarations and check that here before you continue. 为了使它更干净,您还可以添加命令行标志--emit-all-declarations并在继续之前检查此处。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM