简体   繁体   English

antlr html pcdata

[英]antlr html pcdata

Im trying to write very simple HTML parser with ANTLR and Im facing problem, that ~ rule which should match all until specified character is not working. 我试图用ANTLR编写非常简单的HTML解析器,而我面临的问题是,〜规则应该匹配所有规则,直到指定的字符不起作用为止。

My lexer grammar: 我的词法语法:

lexer grammar HtmlParserLexer;

HTML: OHTML PCDATA CHTML;

PCDATA :(~'<') ; //match all until <

OHTML: '<html>';

CHTML: '</html>';

Im trying to match: 我试图匹配:

<html>foo bar</html>

Error from Eclipse ANTLR plugin Interpreter: Eclipse ANTLR插件解释器出错:

MismatchedTokenException: line 1:7 mismatched input UNKNOW expecting '<'

Which means, that my grammar ignore PCDATA rule and I dont know why. 这意味着,我的语法忽略了PCDATA规则,我不知道为什么。 Thanks in advance for your help. 在此先感谢您的帮助。

The rule PCDATA :(~'<') ; 规则PCDATA :(~'<') ; matches a single character other than '<' . 匹配'<'以外的单个字符。 You'll need to repeat it once or more: PCDATA :(~'<')+ ; 你需要重复一次或多次: PCDATA :(~'<')+ ; (notice the + ). (注意+ )。

You may also want to allow <html></html> (nothing in between <html> and </html> ). 您可能还想允许<html></html><html></html>之间没有任何内容)。 In that case, you shouldn't change PCDATA :(~'<')+ ; 在这种情况下,你不应该改变PCDATA :(~'<')+ ; into PCDATA :(~'<')* ; 进入PCDATA :(~'<')* ; , but do this instead: ,但这样做:

HTML: OHTML PCDATA? CHTML;

PCDATA : (~'<')+ ;

because you shouldn't create lexer rules that could potentially match an empty string. 因为您不应该创建可能与空字符串匹配的词法分析器规则。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM