简体繁体 English

测试编译器

[英]Testing a compiler

原文 2011-08-01 03:25:46 2 1 java/ unit-testing/ testing/ compiler-construction/ sablecc

I'm currently working on kind of compiler that was built making use of sablecc . 我目前正在研究使用sablecc构建的编译器。

Long story short, the compiler will take as input both specification files(this is what we're parsing) and .class files and will instrument the .class files bytecode so to make sure that when running the .class files, any of the specifications is not being violated (this is a bit like jml/code contracts! but way more powerful). 简而言之，编译器将把规范文件（这是我们正在解析的）和.class文件作为输入，并将检测.class文件字节码，以确保在运行.class文件时，任何规范没有被违反（这有点像jml /代码合同！但更强大的方式）。

We have some dozens of system tests that cover a large part of the analysis phase (related with making sure the specifications make sense, and that they also are in concordance with the .class files they are supposed to specify). 我们有几十个系统测试，涵盖了分析阶段的大部分内容（与确保规范有意义相关，并且它们也与它们应该指定的.class文件一致）。

We divided them in two sets: the valid tests and the invalid tests. 我们将它们分为两组：有效测试和无效测试。

The valid tests are comprised of source code files that when compiled by our compiler should should pop up no compiler errors / warnings. 有效的测试由源代码文件组成，当我们的编译器编译时，应该不会弹出编译器错误/警告。
The invalid tests are comprised of source code files that when compiled by our compiler should should pop up at least one compiler error / warning. 无效测试由源代码文件组成，当我们的编译器编译时，应该弹出至少一个编译器错误/警告。

This has served us well while we were in the analysis phase. 在我们处于分析阶段时，这对我们很有帮助。 The question now is on how to test the code generation phase. 现在的问题是如何测试代码生成阶段。 I've done, in the past, system tests over a little compiler I've developed on a compilers course. 在过去，我已经完成了对我在编译器课程上开发的一个小编译器的系统测试。 Each test would consist of a couple of source files of that language and a output.txt . 每个测试都包含该语言的几个源文件和output.txt 。 When running the test, I'd compile the source files and then run its main method, checking that the output result would be equal to output.txt . 运行测试时，我将编译源文件，然后运行其main方法，检查输出结果是否等于output.txt 。 All of this was automated, of course. 当然，所有这些都是自动化的。

Now, dealing with this bigger compiler/bytecode-instrumentator, things are not so easy. 现在，处理这个更大的编译器/字节码 - 仪器，事情并不那么容易。 It's no easy task to replicate what I've done with my simple compiler. 复制我用简单编译器完成的工作并不容易。 I guess the way to go is to lean back from system tests at this stage, and focus on unit-tests. 我想要走的路是在这个阶段从系统测试中退缩，并专注于单元测试。

As any compiler developer knows, a compiler consists of lots of visitors. 正如任何编译器开发人员所知，编译器包含大量访问者。 I am not too sure on how to proceed with unit-testing them. 我不太确定如何进行单元测试。 From what I've seen, most of the visitors are calling a counterpart class that has methods related with that visitor (I guess the idea was to keep the SRP for the visitors). 从我所看到的情况来看，大多数访问者都在调用一个与该访问者相关的方法的对应类（我想这个想法是为访问者保留SRP）。

There are a couple of techniques I can take to unit-test my compiler: 我可以采用几种技术对我的编译器进行单元测试：

Unit testing each one of the visitor's methods separately. 单独测试每个访问者的方法。 This seems to be a good idea for a stackless visitor, but looks like a terrible idea for visitors that use one (or more) stacks. 这对于无堆栈访问者来说似乎是一个好主意，但对于使用一个（或多个）堆栈的访问者来说，这看起来是个糟糕的主意。 I then go about also unit-testing each of the other methods from standard(read, non-visitors) classes the traditional way. 然后，我将以传统方式对标准（读取，非访问者）类中的每个其他方法进行单元测试。
Unit testing the whole visitor in one go. 一次性测试整个访客。 That is, I create tree that I then visit. 也就是说，我创建了我随后访问的树。 In the end, I verify if the symbol table was correctly updated or not. 最后，我验证符号表是否正确更新。 I do not care about mocking its dependencies. 我不关心嘲笑它的依赖关系。
The same as 2), but now mocking the visitor's dependencies. 与2）相同，但现在嘲笑访问者的依赖关系。
What others? 别的什么？

I still have the problem that the unit-tests will be very tightly coupled with sabbleCC's AST (which tbh is really ugly). 我仍然有问题，单元测试将与sabbleCC的AST（这真的很难看）非常紧密地结合在一起。

We are currently not making any new tests, but I'd like to bring the train back on track, as I am sure that not testing the system is the same as feeding a monster that sooner or later will come back to bite us in the butt when we least expect it ;-( 我们目前没有进行任何新的测试，但是我想把火车带回正轨，因为我确信没有测试系统就像喂养怪物一样，迟早会回来咬我们的。当我们最不期望的时候对接;-(

Has anyone had any experience with compiler testing that could give some awwweeeesome advice on how to proceed now? 有没有人有任何编译器测试的经验，可以提供一些关于如何继续进行的一些建议？ I'm kinda lost here ! 我有点迷失在这里！

1 个解决方案

I am involved in a project where a Java AST is translated into another language, OpenCL, using the Eclipse compiler, and have similar issues. 我参与了一个项目，其中Java AST被翻译成另一种语言OpenCL，使用Eclipse编译器，并且有类似的问题。

I have no magic solutions for you, but I'll share my experience in case it helps. 我没有为你提供神奇的解决方案，但我会分享我的经验以防万一。

Your technique of testing with expected output (with output.txt) is how I started out as well, but it became an absolute maintenance nightmare for the tests. 您使用预期输出（使用output.txt）进行测试的技术也是我的开始，但它成为测试的绝对维护噩梦。 When I had to change the generator or the output for some reason (which happened a few times) I had to rewrite all the expected output files - and there were huge amounts of them. 当我由于某种原因（发生几次）我不得不更改发生器或输出时，我不得不重写所有预期的输出文件 - 并且有大量的它们。 I started to not want to change output at all for fear of breaking all the tests (which was bad), but in the end I scrapped them and instead did testing on the resulting AST. 我开始根本不想改变输出，因为害怕打破所有测试（这很糟糕），但最后我废弃了它们，而是对生成的AST进行了测试。 This meant I could 'loosely' test the output. 这意味着我可以“松散地”测试输出。 For example, if I wanted to test generation of if statements I could just find the one-and-only if statement in the generated class (I wrote helper methods to do all this common AST stuff), verify a few things about it, and be done. 例如，如果我想测试if语句的生成，我可以在生成的类中找到唯一的if语句（我编写了帮助方法来完成所有这些常见的AST），验证一些关于它的事情，并且完成。 That test wouldn't care how the class was named or whether there were extra annotations or comments. 该测试不关心如何命名类或是否有额外的注释或注释。 This ended up working quite well as the tests were more focused. 由于测试更集中，因此最终工作得很好。 The disadvantage is that the tests were more tightly coupled to the code, so if I ever wanted to rip out the Eclipse compiler/AST library and use something else I'd need to rewrite all my tests. 缺点是测试与代码紧密耦合，所以如果我想撕掉Eclipse编译器/ AST库并使用别的东西，我需要重写我的所有测试。 In the end because the code generation would change over time I was willing to pay that price. 最后因为代码生成会随着时间的推移而改变，所以我愿意支付这个价格。

I also heavily rely on integration tests - tests that actually compile and run the generated code in the target language. 我也非常依赖集成测试 - 实际编译和运行目标语言生成的代码的测试。 I had way more of these types of tests than unit tests purely because they seemed to be more useful and catch more problems. 我认为这些类型的测试比单元测试更多，因为它们似乎更有用并且可以捕获更多问题。

As for visitor testing, again I do more integration-style testing with them - get a really small/specific Java source file, load it up with Eclipse compiler, run one of my visitors with it and check results. 至于访问者测试，我再次对它们进行更多集成式测试 - 获取一个非常小/特定的Java源文件，使用Eclipse编译器加载它，用它运行我的一个访问者并检查结果。 The only other way to test without invoking the Eclipse compiler would be to mock out an entire AST which was just not feasible - most of the visitors were non-trivial and required a fully constructed/valid Java AST as they would read annotations from main class. 在不调用Eclipse编译器的情况下测试的唯一其他方法是模拟整个AST，这是不可行的 - 大多数访问者都是非平凡的并且需要完全构造/有效的Java AST，因为他们会从主类中读取注释。 Most of the visitors were testable in this way because they either generated small OpenCL code fragments or built up a data structure which the unit tests could verify. 大多数访问者都是以这种方式测试的，因为他们要么生成小的OpenCL代码片段，要么建立一个单元测试可以验证的数据结构。

Yes, all my tests are very tightly coupled to the Eclipse compiler. 是的，我的所有测试都与Eclipse编译器紧密耦合。 But so is the actual software we are writing. 但我们正在编写的实际软件也是如此。 Using anything else would mean we'd have to rewrite the whole program anyway so it's a price we're pretty happy to pay. 使用其他任何东西都意味着我们必须重写整个程序，所以这是我们很乐意支付的价格。 I guess there is no one solution - you need to weigh up cost of tight coupling versus test maintainability/simplicity. 我想没有一个解决方案 - 你需要权衡紧耦合的成本与测试可维护性/简单性。

We also have a fair amount of testing utility code, such as setting up the Eclipse compiler with default settings, code to pull out the body nodes of method trees, etc. We try to keep the tests as small as possible (I know this is probably common sense but possibly worth mentioning). 我们还有相当数量的测试实用程序代码，例如使用默认设置设置Eclipse编译器，提取方法树的体节点的代码等。我们尽量保持测试尽可能小（我知道这是可能是常识，但可能值得一提）。

(Edits/Additions below in responses to comments - easier to read/format than comment responses) （以下对评论的回复中的编辑/添加 - 比评论回复更容易阅读/格式化）

"I also heavily rely on integration tests - tests that actually compile and run the generated code in the target language" What did these tests actually do? “我也非常依赖集成测试 - 实际上用目标语言编译和运行生成代码的测试”这些测试实际上做了什么？ How are they different than the output.txt tests? 它们与output.txt测试有什么不同？

(Edit again: After re-reading the question I realize our approaches are the same so ignore this) （再次编辑：重新阅读问题后，我意识到我们的方法是相同的，所以忽略这个）

Rather than just generate source code and compare that to expected output which I did initially, the integration tests generate OpenCL code, compile it and run it. 而不是仅生成源代码并将其与我最初做的预期输出进行比较，集成测试生成OpenCL代码，编译并运行它。 All of the generated code produces output and that output is then compared. 所有生成的代码都会生成输出，然后比较该输出。

For example, I have a Java class that, if the generator works properly, should generate OpenCL code that sums up values in two buffers and puts the value in a third buffer. 例如，我有一个Java类，如果生成器正常工作，应该生成OpenCL代码，该代码将两个缓冲区中的值相加并将值放在第三个缓冲区中。 Initially I would have written a text file with the expected OpenCL code and compared that in my test. 最初我会写一个带有预期OpenCL代码的文本文件，并在我的测试中进行比较。 Now, the integration test generates the code, runs it through the OpenCL compiler, runs it and the test then checks the values. 现在，集成测试生成代码，通过OpenCL编译器运行它，运行它，然后测试检查值。

"As for visitor testing, again I do more integration-style testing with them - get a really small/specific Java source file, load it up with Eclipse compiler, run one of my visitors with it and check results. " Do you mean run with one of your visitors, or run all the visitors up to the visitor you wanna test? “至于访问者测试，我再次与他们进行更多集成式测试 - 获取一个非常小/特定的Java源文件，使用Eclipse编译器加载它，用它运行我的一个访问者并检查结果。”你的意思是运行和你的一个访问者一起，或者把所有访问者都带到你想测试的访问者那里？

Most of the visitors could be run independently of each other. 大多数访客可以彼此独立运行。 Where possible I would run with only the visitor I am testing, or if there is a dependency on others, the minimal set of visitors required (usually just one other one was required). 在可能的情况下，我只会与我正在测试的访问者一起运行，或者如果依赖于其他访问者，则需要最少的访问者（通常只需要另外一个访问者）。 The visitors don't talk directly to each other, but use context objects that are passed around. 访问者不直接相互交谈，而是使用传递的上下文对象。 These can be constructed artificially in the tests to get things into a known state. 这些可以在测试中人工构建，以使事物进入已知状态。

Other question, do you use mocks -- at all, in this project? 其他问题，你在这个项目中使用模拟吗？ Moreover, do you regularly use mocks in other projects? 而且，你经常在其他项目中使用模拟吗？ I'm just trying to get a clear picture about the person I'm talking with :P 我只想弄清楚我正在谈论的那个人：P

In this project we use mocks in about 5% of tests, probably even less. 在这个项目中，我们在大约5％的测试中使用模拟，可能更少。 And I don't mock out any Eclipse compiler stuff. 我不会模拟任何Eclipse编译器的东西。

The thing with mocks is that I'd need to understand what I'm mocking out well, and that is not the case with the Eclipse compiler. 模拟的事情是我需要理解我正在嘲笑的东西，而Eclipse编译器则不然。 There are a whole lot of visitor methods that are called, and sometimes I'm not sure which one should be called (eg is visit ExtendedStringLiteral or visit StringLiteral called for string literals?) If I did mock this out and assumed one or the other, this might not correspond to reality and the program would fail even if the tests would pass - not desired. 有很多访问者方法被调用，有时我不确定应该调用哪一个（例如访问ExtendedStringLiteral或访问StringLiteral调用字符串文字？）如果我确实模拟了这个并假设一个或另一个，这可能与现实不符，即使测试通过，程序也会失败 - 不希望如此。 The only mocks we do are a couple for the annotation processor API, a couple of Eclipse compiler adapters, and some of our own core classes. 我们做的唯一的嘲笑是注释处理器API，几个Eclipse编译器适配器和一些我们自己的核心类。

Other projects, such as Java EE stuff, more mocks were used, but I'm still not an avid user of them. 其他项目，如Java EE的东西，使用了更多的模拟，但我仍然不是他们的狂热用户。 The more defined, understood and predictable an API is the more likely I am to consider using mocks. API的定义，理解和可预测性越高，我就越有可能考虑使用模拟。

The first phases of our program are just like of a regular compiler. 我们程序的第一阶段就像常规编译器一样。 We extract info from the source files and we fill up a (big and complex!) symbol table. 我们从源文件中提取信息，然后填写一个（大而复杂的！）符号表。 How would you go about system testing this? 您将如何进行系统测试？ In theory, I could create a test with the source files and also a symbolTable.txt (or .xml or whatever) that contains all the info about the symbolTable, but that would, I think, be a bit complex to do. 从理论上讲，我可以使用源文件和symbolTable.txt（或.xml或其他）创建一个测试，其中包含有关symbolTable的所有信息，但我认为这样做有点复杂。 Each one of those integration tests would be a complex thing to accomplish! 每个集成测试都是一件复杂的事情！

I'd try to take the approach of testing small bits of the symbol table rather than the whole lot in one go. 我试着采用一次性测试符号表的小位而不是整个批次的方法。 If I were testing whether a Java tree was built correctly, I'd have something like: 如果我正在测试Java树是否正确构建，我会有类似的东西：

one test just for if statements: 一个测试只针对if语句：
- have source code with one method containing one if statement 有一个包含一个if语句的方法的源代码
- builds symboltable / tree from this source 从此源构建符号/树
- pull out statement tree from only method body from main class (fail test if >1 or no method bodies, classes found, top-level statement nodes in method body) 从主类中仅从方法体中拉出语句树（如果> 1或者没有找到方法体，发现类，方法体中的顶级语句节点，则失败测试）
- compare if statement's node attributes (condition, body) programmatically 以编程方式比较if语句的节点属性（条件，正文）
at least one test for each other kind of statement in a similar style. 至少对一种类似风格的其他类型的陈述进行测试。
other tests, maybe for multiple statements, etc. or whatever is needed 其他测试，可能是多个陈述等等或任何需要的

This approach is integration-style testing, but each integration test only tests a small part of the system. 这种方法是集成式测试，但每次集成测试只测试系统的一小部分。

Essentially I'd try to keep the tests as small as possible. 基本上我会尽量保持测试尽可能小。 A lot of the testing code for pulling out bits of the tree can be moved into utility methods to keep the test classes small. 用于提取树的位的许多测试代码可以移动到实用程序方法中以使测试类保持较小。

I thought that maybe I could create a pretty printer that would take on the Symbol Table and output the correspondent source files (that, if everything was ok, would be just like the original source files). 我想也许我可以创建一个漂亮的打印机，它将采用符号表并输出相应的源文件（如果一切正常，就像原始源文件一样）。 The problem is that the original files can have things in different order than what my pretty printer prints. 问题是原始文件的内容可能与我的漂亮打印机打印的顺序不同。 I'm afraid that with this approach I might just be opening another can of worms. 我担心通过这种方法，我可能只是打开另一种蠕虫。 I've been relentless refactoring parts of the code and the bugs are starting to show off. 我一直在无情地重构代码的一部分，而且这些错误开始显露出来。 I really need some integration tests to keep me on track. 我真的需要一些集成测试来让我保持正轨。

That's exactly the approach I've taken. 这正是我采取的方法。 However in my system the order of stuff doesn't change much. 但是在我的系统中，东西的顺序没有太大变化。 I have generators that essentially output code in response to Java AST nodes, but there is a bit of freedom in that generators can call themselves recursively. 我有基本上输出代码以响应Java AST节点的生成器，但是有一点自由，因为生成器可以递归地调用它们自己。 For example, the 'if' generator that gets fired off in response to a Java If statement AST node can write out 'if (', then ask other generators to render the condition, then write ') {', ask other generators to write out the body, then write '}'. 例如，响应Java If语句AST节点而被触发的'if'生成器可以写出'if（'，然后请求其他生成器呈现条件，然后写'）{'，请其他生成器写入出身体，然后写'}'。