[英]antlr html pcdata
Im trying to write very simple HTML parser with ANTLR and Im facing problem, that ~ rule which should match all until specified character is not working. 我试图用ANTLR编写非常简单的HTML解析器,而我面临的问题是,〜规则应该匹配所有规则,直到指定的字符不起作用为止。
My lexer grammar: 我的词法语法:
lexer grammar HtmlParserLexer;
HTML: OHTML PCDATA CHTML;
PCDATA :(~'<') ; //match all until <
OHTML: '<html>';
CHTML: '</html>';
Im trying to match: 我试图匹配:
<html>foo bar</html>
Error from Eclipse ANTLR plugin Interpreter: Eclipse ANTLR插件解释器出错:
MismatchedTokenException: line 1:7 mismatched input UNKNOW expecting '<'
Which means, that my grammar ignore PCDATA rule and I dont know why. 这意味着,我的语法忽略了PCDATA规则,我不知道为什么。 Thanks in advance for your help.
在此先感谢您的帮助。
The rule PCDATA :(~'<') ;
规则
PCDATA :(~'<') ;
matches a single character other than '<'
. 匹配
'<'
以外的单个字符。 You'll need to repeat it once or more: PCDATA :(~'<')+ ;
你需要重复一次或多次:
PCDATA :(~'<')+ ;
(notice the +
). (注意
+
)。
You may also want to allow <html></html>
(nothing in between <html>
and </html>
). 您可能还想允许
<html></html>
( <html>
和</html>
之间没有任何内容)。 In that case, you shouldn't change PCDATA :(~'<')+ ;
在这种情况下,你不应该改变
PCDATA :(~'<')+ ;
into PCDATA :(~'<')* ;
进入
PCDATA :(~'<')* ;
, but do this instead: ,但这样做:
HTML: OHTML PCDATA? CHTML;
PCDATA : (~'<')+ ;
because you shouldn't create lexer rules that could potentially match an empty string. 因为您不应该创建可能与空字符串匹配的词法分析器规则。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.