简体   繁体   English

JavaCC词法分析器不能按预期工作(空格不被忽略)

[英]JavaCC lexer doesn't work as expected (whitespace not ignored)

I'm trying to implement a parser for the example file listed below. 我正在尝试为下面列出的示例文件实现解析器。 I'd like to recognize quoted strings with '+' between them as a single token. 我想识别引用的字符串,它们之间带有'+'作为单个标记。 So I created a jj file, but it doesn't match such strings. 所以我创建了一个jj文件,但它与这些字符串不匹配。 I was under the impression that JavaCC is supposed to match the longest possible match for each token spec. 我的印象是JavaCC应该匹配每个令牌规范的最长匹配。 But that doesn't seem to be case for me. 但这对我来说似乎并非如此。

What am I doing wrong here? 我在这做错了什么? Why isn't my <STRING> token matching the '+' even though it's specified in there? 为什么我的<STRING>标记不匹配'+'即使它在那里指定了? Why is whitespace not being ignored? 为什么空格不被忽视?

options {
  TOKEN_FACTORY = "Token";
}

PARSER_BEGIN(Parser)

package com.example.parser;

public class Parser {

  public static void main(String args[]) throws ParseException {

      ParserTokenManager manager = new ParserTokenManager(new SimpleCharStream(Parser.class.getResourceAsStream("example")));
      Token token = manager.getNextToken();
      while (token != null && token.kind != ParserConstants.EOF) {
          System.out.println(token.toString() + "[" + token.kind + "]");
          token = manager.getNextToken();
      }

      Parser parser = new Parser(Parser.class.getResourceAsStream("example"));
      parser.start();
  }

}

PARSER_END(Parser)

// WHITE SPACE
<DEFAULT, IN_STRING_KEYWORD>
SKIP :
{
  " " // <-- skipping spaces
| "\t"
| "\n"
| "\r"
| "\f"
}

// TOKENS
TOKEN :
{
< KEYWORD1 : "keyword1" > : IN_STRING_KEYWORD
}

<IN_STRING_KEYWORD>
TOKEN : {<STRING : <CONCAT_STRING> | <UNQUOTED_STRING> > : DEFAULT 
| <#CONCAT_STRING : <QUOTED_STRING> ("+" <QUOTED_STRING>)+ >
// <-- CONCAT_STRING never matches   "+" part when input is "'smth' +", because whitespace is not ignored!?
| <#QUOTED_STRING : <SINGLEQUOTED_STRING> | <DOUBLEQUOTED_STRING> >
| <#SINGLEQUOTED_STRING : "'" (~["'"])* "'" >
| <#DOUBLEQUOTED_STRING : 
    "\""
      (
        (~["\"", "\\"]) |
        ("\\" ["n", "t", "\"", "\\"])
      )* 
    "\""
  >
| <#UNQUOTED_STRING : (~[" ","\t", ";", "{", "}", "/", "*", "'", "\"", "\n", "\r"] | "/" ~["/", "*"] | "*" ~["/"])+ >
}

void start() :
{}
{
  (<KEYWORD1><STRING>";")+ <EOF>
}

Here's an example file that should get parsed: 这是一个应该解析的示例文件:

keyword1 "foo" + ' bar';

I'd like to match the argument of the first keyword1 as a single <STRING> token. 我想将第一个keyword1 1的参数与单个<STRING>标记相匹配。

Current output: 当前输出:

keyword1[6]
Exception in thread "main" com.example.parser.TokenMgrError: Lexical error at line 1, column 15.  Encountered: " " (32), after : "\"foo\""
    at com.example.parser.ParserTokenManager.getNextToken(ParserTokenManager.java:616)
    at com.example.parser.Parser.main(Parser.java:12)

I'm using JavaCC 5.0. 我正在使用JavaCC 5.0。

STRING is expanding to the longest sequence that can be matched, which is "foo" as the error indicates. STRING正在扩展到可以匹配的最长序列,如错误所示,这是"foo" The space after the closing double quote is not part of the definition of the private token CONCAT_STRING . 结束双引号后的空格不是私人令牌CONCAT_STRING定义的一部分。 Skip tokens do not apply within the definition of other tokens, so you must incorporate the space directly into the definition, on either side of the + . 跳过令牌不适用于其他令牌的定义,因此您必须将空间直接合并到+任一侧的定义中。

As an aside, I recommend have a final token definition like so: 顺便说一句,我建议有一个最终的令牌定义,如下所示:

<each-state-in-which-the-empty-string-cannot-be-recognized>
TOKEN : {
    < ILLEGAL : ~[] >
}

This prevents TokenMgrError s from being thrown and makes debugging a bit easier. 这可以防止抛出TokenMgrError并使调试更容易一些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM