（x86）汇编程序优化

Question

I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows. 我正在为Java构建针对Windows的x86-32（IA32）处理器的编译器/汇编器/链接器。

High-level concepts (I do not have any "source code": there is no syntax nor lexical translation, and all languages are regular) are translated into opcodes, which then are wrapped and outputted to a file. 高级概念（我没有任何“源代码”：没有语法或词汇翻译，所有语言都是常规的）被翻译成操作码，然后将其包装并输出到文件中。 The translation process has several phases, one is the translation between regular languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code ( probably more than 3 levels). 翻译过程有几个阶段，一个是常规语言之间的翻译：最高级别的代码被翻译成中级代码，然后被翻译成最低级别的代码（可能超过3个级别）。

My problem is the following; 我的问题如下; if I have higher-level code ( X and Y ) translated to lower-level code ( x , y , U and V ), then an example of such a translation is, in pseudo-code: 如果我将更高级别的代码（ X和Y ）转换为更低级别的代码（ x ， y ， U和V ），那么这种翻译的一个例子是伪代码：

x + U(f) // generated by X
+
V(f) + y // generated by Y

(An easy example) where V is the opposite of U (compare with a stack push as U and a pop as V ). （一个简单的例子）其中V与U相反（与作为U的堆栈推送和作为V的pop相比）。 This needs to be 'optimized' into: 这需要“优化”为：

x + y

(essentially removing the "useless" code) （基本上删除“无用的”代码）

My idea was to use regular expressions. 我的想法是使用正则表达式。 For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null , meaning for all x find U(x) followed by V(x) and replace by null . 对于上面的情况，它将是一个正则表达式，如下所示： x:(U(x)+V(x)):null ，表示所有x找到U(x)后跟V(x)并替换为null 。 Imagine more complex regular expressions, for more complex optimizations. 想象一下更复杂的正则表达式，用于更复杂的优化。 This should work on all levels. 这应该适用于所有级别。

What do you suggest? 你有什么建议？ What would be a good approach to optimize and produce fast x86 assembly? 什么是优化和生产快速x86组件的好方法？

Answer 1

What you should actually do is build an Abstract Syntax Tree (AST) . 你应该做的是构建一个抽象语法树（AST） 。

It is a representation of the source code in the form of a tree, that is much easier to work with, especially to make transformations and optimizations. 它是树形式的源代码的表示，更容易使用，尤其是进行转换和优化。

That code, represented as a tree, would be something like: 该代码表示为树，类似于：

(+
    (+
        x
        (U f))
    (+
        (V f)
        y))

You could then try to make some transformations: a sum of sums is a sum of all the terms: 然后你可以尝试进行一些转换：总和之和是所有术语的总和：

(+
    x
    (U f)
    (V f)
    y)

Then you could scan the tree and you could have the following rules: 然后你可以扫描树，你可以有以下规则：

(+ (U x) (V x)) = 0, for all x （+（U x）（V x））= 0，对于所有x
(+ 0 x1 x2 ...) = x, for all x1, x2, ... （+ 0 x1 x2 ...）= x，对于所有x1，x2，...

Then you would obtain what you are looking for: 然后你会得到你想要的东西：

(+ x y)

Any good book on compiler-writing will discuss a lot on ASTs. 任何关于编译器编写的好书都会在AST上讨论很多。 Functional programming languages are specially suited for this task, since in general it is easy to represent trees and to do pattern matching to parse and transform the tree. 函数式编程语言特别适合于此任务，因为通常很容易表示树并进行模式匹配以解析和转换树。

Usually, for this task, you should avoid using regular expressions . 通常，对于此任务， 您应该避免使用正则表达式 。 Regular expressions define what mathematicians call regular languages . 正则表达式定义了数学家称之为常规语言的东西 。 Any regular language can be parsed by a set of regular expressions. 任何常规语言都可以通过一组正则表达式进行解析。 However, I think your language is not regular, so it cannot be properly parsed by regexps. 但是，我认为您的语言不规则，因此无法通过正则表达式进行正确解析。

People try, and try, and try to parse languages such as HTML using regular expressions. 人们尝试并尝试使用正则表达式解析HTML等语言。 This has been extensively discussed here in SO, and you cannot parse HTML with regular expressions. 这已在SO中进行了广泛讨论，您无法使用正则表达式解析HTML。 There will always be an exceptional case in which your regular expressions would fail, and you would have to adapt it. 总会出现一个特殊情况，你的正则表达式会失败，你必须适应它。

It might be the same with your language: if it is not regular, you should avoid lots of headaches and not try to parse it (and especially "transform" it) using regular expressions. 它可能与您的语言相同：如果它不是常规的，您应该避免许多令人头疼的事情，而不是尝试使用正则表达式解析它（尤其是“转换”它）。

Answer 2

I'm having a lot of trouble understanding this question, but I think you will find it useful to learn something about term-rewriting systems , which seems to be what you are proposing. 我在理解这个问题时遇到了很多麻烦，但我认为你会发现学习一些关于术语重写系统的东西很有用，这似乎是你提出的建议。 Whether the mechanism is tree rewriting (always works) or regular expressions (will work for some languages some of the time and other languages all of the time) is of secondary importance. 机制是树重写（始终有效）还是正则表达式（某些时候某些语言和某些语言都适用）是次要的。

It is definitely possible to optimize object code by term rewriting. 绝对可以通过术语重写来优化对象代码。 You probably also will benefit from learning something about peephole optimization ; 你可能也会从学习窥视孔优化方面受益; a good place to start, because it is very strong on the fundamentals, is a paper by Davidson and Fraser on a retargetable peephole optimizer . 一个好的起点，因为它在基本面上非常强大，是Davidson和Fraser关于可重定向的窥视孔优化器的论文。 There's also excellent later work by Benitez and Davidson. 贝尼特斯和戴维森的后期工作也非常出色。

（x86）汇编程序优化

问题描述

2 个解决方案

解决方案1
8 2010-03-23 23:17:00

解决方案2
5 已采纳 2010-03-24 00:00:22

（x86）汇编程序优化

问题描述

2 个解决方案

解决方案1 8 2010-03-23 23:17:00

解决方案2 5 已采纳 2010-03-24 00:00:22

解决方案1
8 2010-03-23 23:17:00

解决方案2
5 已采纳 2010-03-24 00:00:22