简体繁体 English

DOM树解析和语法树解析之间的区别？

[英]Difference between a DOM tree parsing and a syntax tree parsing?

原文 2012-05-15 19:22:48 7 2 parsing/ dom/ schema/ context-free-grammar/ concrete-syntax-tree

After parsing HTML or XML file, we can get the DOM tree. 解析HTML或XML文件后，我们可以获得DOM树。

After parsing C, C++, or JavaScript, we can get the Syntax tree. 在解析C，C ++或JavaScript之后，我们可以获得语法树。

Note that the syntax tree is constructed based on the context-free grammar which specifies a valid C/C++/JS program. 请注意，语法树是基于无上下文语法构造的，该语法指定了有效的C / C ++ / JS程序。

But it seems the DOM tree is just a pure hierarchy structure specified only by the HTML/XML file. 但似乎DOM树只是一个纯HTML层次结构，仅由HTML / XML文件指定。 Is that true? 真的吗？ Is that the reason that Schema Validation has been done after parsing? 这是解析后进行模式验证的原因吗？ What is the fundamental difference between these two kinds of parse trees? 这两种解析树之间的根本区别是什么？

2 个解决方案

Thank you for Ira Baxter and Guy Coder's interests. 感谢Ira Baxter和Guy Coder的兴趣。

I re-searched for a while, and compared these two cases. 我重新搜索了一会儿，并对这两个案例进行了比较。 My impression is like this: 我的印象是这样的：

The " parsing " for XML can be either " validating parsing " or " non-validating parsing ". XML的“ 解析 ”可以是“ 验证解析 ”或“ 非验证解析 ”。 For the later one, the parser does not check its syntax against the Document Type Definition (DTD) file. 对于后者，解析器不会根据文档类型定义 （DTD）文件检查其语法。 This parser only produces the hierarchy of the elements in the XML file. 此解析器仅生成XML文件中元素的层次结构。 So it is lighter than the " validating parsing ". 所以它比“ 验证解析 ”更轻。

The " parsing " for C/C++/Java generates the syntax tree based on its context-free grammar. C / C ++ / Java的“ 解析 ”基于其无上下文语法生成语法树。 So, informally, it is more like the " validating parsing ". 所以，非正式地说，它更像是“ 验证解析 ”。

PS: I am not an expert, so welcome any comments if you found my understanding is not correct. PS：我不是专家，所以如果你发现我的理解不正确，欢迎任何评论。

Like any other language, XML is described by a grammar. 与任何其他语言一样，XML由语法描述。 XML's grammar is rather simple (start-tags, end-tags, correct nesting). XML的语法相当简单（开始标记，结束标记，正确嵌套）。 So the syntax tree might seem simple as well (just an hierarchy of elements). 所以语法树看起来也很简单（只是元素的层次结构）。 An XML schema is another grammar that describes an XML file's content. XML模式是描述XML文件内容的另一种语法 。

So basically it's two parsers being invoked after each other. 所以基本上它是两个互相调用的解析器。 The first one verifies that all start-tags have an end-tag and that the nesting is right. 第一个验证所有start-tags都有一个结束标记，并且嵌套是正确的。

The second parser verifies that the XML file's content is structured according to the schema (grammar).. like that an element named "B" can only be contained within an element named "A". 第二个解析器验证XML文件的内容是根据模式（语法）构造的......就像名为“B”的元素只能包含在名为“A”的元素中一样。

This shouldn't be compared to parsing programming languages like C since you cannot change a programming language's syntax. 这不应该与解析像C这样的编程语言相比，因为你无法改变编程语言的语法。 If-statements can only appear within function bodies, not outside and you cannot change that. If语句只能出现在函数体内，而不能出现在外部，你不能改变它。 However in XML you can specify that "B"-elements can only appear within "A"-elements, or that "A"-elements can only appear within "B"-elements.. all by specifying the grammar of your XML file's content in form of a schema. 但是在XML中，您可以指定“B” - 元素只能出现在“A”元素中，或者“A”元素只能出现在“B”元素中...所有这些都是通过指定XML文件内容的语法来实现的以架构的形式。