简体   繁体   English

JavaCC:如何维护原始文本(带空格)

[英]JavaCC: How can I maintain the original text (with spaces)

Let's assume that I have a simple JavaCC grammar to parse additions and subtractions: 假设我有一个简单的JavaCC语法来解析加法和减法:


....
void CompilationUnit() :
{}
{
  (Expression())+
  EOF
}
void Expression() :
{}
{
  Number()
  (
    Addition()
  | Subtraction()
  )*
}
void Number() :
{}
{
  
}
void Addition() :
{}
{
   Number()
}
void Subtraction() :
{}
{
   Number()
}

I have classes that are using the AST produced by this grammar to calculate the result: 我有一些类使用此语法产生的AST来计算结果:


public class Calculator extends DepthFirstVisitor {
  int result = -1;
  public void visit(Expression n) {
    if (result >= 0) System.out.println(toText(n) + " = " + result);
    result = 0;
    super.visit(n);
  }
  public void visit(Number n) {
    ...
  }
  public void visit(Addition n) {
    ...
  }
  ....
}

I am able to calculate the value of the expression but I also need the original expression as well (as it appeared). 我能够计算表达式的值,但我也需要原始表达式(如它所显示的)。 So for the following input: 因此,对于以下输入:

5 + 2 - 1
  2 + 1

I want to have the following output: 我想要以下输出:

5 + 2 - 1 = 6
2 + 1 = 3

Unfortunately, because I'm skipping characters like spaces or newlines, what I'm getting is: 不幸的是,由于我跳过空格或换行符之类的字符,因此得到的是:

5+2-1 = 6
2+1 = 3

Is there any way I can output the original text (including the skipped characters)? 有什么办法可以输出原始文本(包括跳过的字符)?

Please note that the actual problem is much bigger and the grammar much more complicated. 请注意 ,实际问题要大得多,语法要复杂得多。 So I'm not really looking for a solution specific to the above problem (eg preprocess the lines and split them on newline characters or modify methods to "manually" add spaces after every token) but more like a solution that is using some JavaCC feature. 因此,我并不是真正在寻找针对上述问题的解决方案(例如,预处理行并将它们拆分为换行符,或者修改方法以在每个令牌后“手动”添加空格),而更像是使用某些JavaCC功能的解决方案。

Both ANTLr and Xtext support "hidden tokens" for whitespace and comments. ANTLr和Xtext都支持空格和注释的“隐藏令牌”。 See here for some hints or use Google with that term. 请参阅此处以获取一些提示,或在该术语中使用Google。 Perhaps JavaCC has some similar concept. 也许JavaCC有一些类似的概念。

EDIT : JavaCC seems to use the term "special token". 编辑 :JavaCC似乎使用术语“特殊令牌”。 See here for some details . 有关详细信息,请参见此处

Basically you can't do this in a compiler. 基本上,您不能在编译器中执行此操作。 You would have to capture whitespace as a token in the grammar and allow it everywhere it is allowed, which is everywhere, and the resultant grammar would be so complex as to be infeasible to implement or maybe even generate. 您将必须在语法中捕获空白作为令牌,并在允许的任何地方(无论在何处)都将其允许,结果语法太复杂,以致于无法实现甚至生成。 You will have to make do with capturing a reference to the co-ordinates in the source code (line and column) where the entity came from: maybe for example the text of the current line and column number. 您将不得不捕获对实体源自的源代码(行和列)中的坐标的引用:例如,当前行和列号的文本。

There's a reason why compilers behave the way they do. 编译器以其行为方式行事是有原因的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM