antlr html pcdata

Question

Im trying to write very simple HTML parser with ANTLR and Im facing problem, that ~ rule which should match all until specified character is not working. 我试图用ANTLR编写非常简单的HTML解析器，而我面临的问题是，〜规则应该匹配所有规则，直到指定的字符不起作用为止。

My lexer grammar: 我的词法语法：

lexer grammar HtmlParserLexer;

HTML: OHTML PCDATA CHTML;

PCDATA :(~'<') ; //match all until <

OHTML: '<html>';

CHTML: '</html>';

Im trying to match: 我试图匹配：

<html>foo bar</html>

Error from Eclipse ANTLR plugin Interpreter: Eclipse ANTLR插件解释器出错：

MismatchedTokenException: line 1:7 mismatched input UNKNOW expecting '<'

Which means, that my grammar ignore PCDATA rule and I dont know why. 这意味着，我的语法忽略了PCDATA规则，我不知道为什么。 Thanks in advance for your help. 在此先感谢您的帮助。

Answer 1

The rule PCDATA :(~'<') ; 规则PCDATA :(~'<') ; matches a single character other than '<' . 匹配'<'以外的单个字符。 You'll need to repeat it once or more: PCDATA :(~'<')+ ; 你需要重复一次或多次： PCDATA :(~'<')+ ; (notice the + ). （注意+ ）。

You may also want to allow <html></html> (nothing in between <html> and </html> ). 您可能还想允许<html></html> （ <html>和</html>之间没有任何内容）。 In that case, you shouldn't change PCDATA :(~'<')+ ; 在这种情况下，你不应该改变PCDATA :(~'<')+ ; into PCDATA :(~'<')* ; 进入PCDATA :(~'<')* ; , but do this instead: ，但这样做：

HTML: OHTML PCDATA? CHTML;

PCDATA : (~'<')+ ;

because you shouldn't create lexer rules that could potentially match an empty string. 因为您不应该创建可能与空字符串匹配的词法分析器规则。

antlr html pcdata

问题描述

1 个解决方案

解决方案1
3 已采纳 2011-12-17 21:11:21

antlr html pcdata

问题描述

1 个解决方案

解决方案1 3 已采纳 2011-12-17 21:11:21

解决方案1
3 已采纳 2011-12-17 21:11:21