简体   繁体   English

我可以使用运算符优先语法定义 XML 语法吗?

[英]Can I define the XML syntax using an operator precedence grammar?

Let's focus on the following parts with the following assumptions,让我们在以下假设下关注以下部分,

  • any identifier having an Uppercase initial character is a Terminal ( Misc , CharData , Reference , CDSect , PI , Comment )任何具有大写初始字符的标识符都是终端( MiscCharDataReferenceCDSectPIComment
  • otherwise (lowercase initial character), a nonterminal ( document , prolog , element )否则(小写初始字符),非终结符( documentprologelement
[1]  document      ::=      prolog element Misc*
[39] element       ::=      STag content ETag
[43] content       ::=      CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

I want to write it into an operator precedence grammar.我想把它写成运算符优先语法。 But I failed to complete the rule content .但我未能完成规则content How can I define it?我该如何定义它?

document : prolog element
         | prolog element misc
         ;
misc     : misc Misc
         | Misc
         ;
element  : STag ETag
         | STag content ETage
         ;

That grammar is not an operator grammar, so attempting to write an operator precedence parser for it is bound to fail.该语法不是运算符语法,因此尝试为其编写运算符优先级解析器肯定会失败。 If you really want to pursue that project, you'll need to rewrite the grammar.如果您真的想从事该项目,则需要重写语法。

Recall that there are two essential features of an operator grammar:回想一下,运算符语法有两个基本特征:

  • Every production includes at least one operator (terminal).每个生产包括至少一个操作员(终端)。
  • No production includes two consecutive non-terminals.没有生产包括两个连续的非终端。

The first rule prohibits empty and unit productions.第一条规则禁止空生产和单元生产。 Those can be mechanically eliminated at the cost of bloating the grammar.这些可以以臃肿的语法为代价机械地消除。

The second rule prohibits right-hand sides like document: prolog element .第二条规则禁止像document: prolog element这样的右手边。 But more critically, it won't let you use element as a non-terminal because the language itself permits juxtaposed element s.但更关键的是,它不会让您将element用作非终结符,因为语言本身允许并列element s。 That modification should be possible, since every element in fact starts and ends with a terminal, so you should be able to eliminate element from the grammar by macro-replacing all of its uses with the definitions.这种修改应该是可能的,因为实际上每个element都以终端开始和结束,因此您应该能够通过用定义宏替换其所有用途来从语法中消除element But it's also going to be tedious.(Also, I'm not convinced that making STag and ETag terminals really reflects the syntax; a start tag is a syntactically complicated object which must somehow be parsed.)但这也会很乏味。(另外,我不相信制作STagETag终端真的反映了语法;开始标签是语法复杂的 object 必须以某种方式解析。)

Once you've done all that, you'll need to cope with the essential context-sensitivity of XML, which results from the need for agreement between start and end tags.完成所有这些后,您将需要处理 XML 的基本上下文敏感性,这是由于需要在开始和结束标签之间达成一致。 Most people simply redefine that as "semantic", in order to be able to use a context-free grammar, but it is still required for a correct parse.大多数人只是简单地将其重新定义为“语义”,以便能够使用上下文无关语法,但它仍然是正确解析所必需的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM