未公开/错误的HTML标记扩展到其父级之外

Question

I'm running into some interesting functionality when HTML tags aren't closed. 当HTML标签没有关闭时，我遇到了一些有趣的功能。 Sometimes the browser inserts extra opening and closing tags to compensate, and other times it just inserts a closing tag. 有时浏览器会插入额外的开始和结束标记以进行补偿，有时它只会插入结束标记。 This is best explained through examples: 最好通过示例解释：

With the  tag: 使用标签：

 first text node <div> This is a parent div <sup>superscript tag starts IN parent</div> text OUTSIDE node of parent

With the <s> tag: 使用<s>标签：

 first text node <div> This is a parent div <s>strikethrough tag starts IN parent</div> text OUTSIDE node of parent

As you can see in the first example the browser automatically closes the  tag before its parent closes. 正如您在第一个示例中所看到的，浏览器会在其父关闭之前自动关闭标记。 However, in the second example the browser seems to close the <s> tag before the end of its parent and then inserts another starting <s> after the parent. 但是，在第二个示例中，浏览器似乎在其父节点结束之前关闭<s>标签，然后在父节点之后插入 另一个起始<s> 。

I've looked through the <s> and the  specs - I can't seem to find anything specific to how browsers interpret and deal with unclosed tags.. At least nothing that explains this functionality. 我查看了<s>和规范 - 我似乎无法找到任何特定于浏览器如何解释和处理未闭合标签的内容。至少没有任何解释此功能的内容。

The reason I'm wanting to know this is for a live markdown parser I'm using - users may not finish their tags before it parses their source. 我想知道这个的原因是我正在使用的实时降价解析器 - 用户可能无法在解析其源代码之前完成其标记。

I'd like to know how the browser deals with these things, so I can code for that use-case. 我想知道浏览器如何处理这些事情，所以我可以为这个用例编写代码。 At the present time the browser handles closing different tags in different ways (as you can see by my examples). 目前，浏览器以不同的方式处理关闭不同的标签（正如您的示例所示）。

Does anyone know why the browser does this? 有谁知道为什么浏览器会这样做？ Or at least know a list of elements that act the same? 或者至少知道一系列相同的元素？

Answer 1

Thanks to @Ankith Amtange I found the explanation of what happens. 感谢@Ankith Amtange，我找到了发生的事情的解释。 I'll write it out here for future readers. 我会在这里为未来的读者写出来。

The <s> tag extends past its parent because it is a formatting element . <s>标记扩展到其父标记之外，因为它是格式化元素 。 The  tag is automatically closed because the browser expected a closing  tag before the end of the parent element. 标记自动关闭，因为浏览器期望在父元素结束之前关闭标记。

The HTML parser treats elements differently in its stack, which fall into the following categories ( source ): HTML解析器在其堆栈中以不同方式处理元素，这些元素属于以下类别（源）：

Special elements 特殊元素

The following elements have varying levels of special parsing rules: HTML 's address , applet , area , article , aside , base , basefont , bgsound , blockquote , body , br , button , caption , center , col , colgroup , dd , details , dir , div , dl , dt , embed , fieldset , figcaption , figure , footer , form , frame , frameset , h1 , h2 , h3 , h4 , h5 , h6 , head , header , hgroup , hr , html , iframe , img , input , isindex , li , link , listing , main , marquee , meta , nav , noembed , noframes , noscript , object , ol , p , param , plaintext , pre , script , section , select , source , style , summary , table , tbody , td , template , textarea , tfoot , th , thead , title , tr , track , ul , wbr , and xmp ; 以下元素具有不同级别的特殊解析规则： HTML的address ， applet ， area ， article ， aside ， base ， basefont ， bgsound ， blockquote ， body ， br ， button ， caption ， center ， col ， colgroup ， dd ， details ， dir ， div ， dl ， dt ， embed ， fieldset ， figcaption ， figure ， footer ， form ， frame ， frameset ， h1 ， h2 ， h3 ， h4 ， h5 ， h6 ， head ， header ， hgroup ， hr ， html ， iframe ， img ， input ， isindex ， li ， link ， listing ， main ， marquee ， meta ， nav ， noembed ， noframes ， noscript ， object ， ol ， p ， param ， plaintext ， pre ， script ， section ， select ， source ， style ， summary ， table ， tbody ， td ， template ， textarea ， tfoot ， th ， thead ， title ， tr ， track ， ul ， wbr ，和xmp ; MathML 's mi , mo , mn , ms , mtext , and annotation-xml ; MathML的mi ， mo ， mn ， ms ， mtext和annotation-xml ; and SVG 's foreignObject , desc , and title . 和SVG的foreignObject ， desc和title 。

Formatting elements 格式化元素

The following HTML elements are those that end up in the list of active formatting elements: a , b , big , code , em , font , i , nobr , s , small , strike , strong , tt , and u . 以下HTML元素最终位于活动格式元素列表中： a ， b ， big ， code ， em ， font ， i ， nobr ， s ， small ， strike ， strong ， tt和u 。

Ordinary elements 普通元素

All other elements found while parsing an HTML document. 解析HTML文档时找到的所有其他元素。

Explanation (from linked spec): 解释（来自链接规范）：

The most-often discussed example of erroneous markup is as follows: 最经常讨论的错误标记示例如下：

<p>1<b>2<i>3</b>4</i>5</p>

The parsing of this markup is straightforward up to the "3". 解析此标记直到“3”。 At this point, the DOM looks like this: 此时，DOM看起来像这样：

─html
 ├──head
 └──body
    └──p
       ├──"1"
       └──b
          ├──"2"
          └──i
             └──"3"

Here, the stack of open elements has five elements on it: html , body , p , b , and i . 这里，开放元素的堆栈上有五个元素： html ， body ， p ， b和i 。 The list of active formatting elements just has two: b and i . 活动格式化元素列表只有两个： b和i 。 The insertion mode is "in body". 插入模式是“在体内”。

Upon receiving the end tag token with the tag name " b ", the "adoption agency algorithm" is invoked. 在接收到具有标签名称“ b ”的结束标签令牌时，调用“采用代理算法”。 This is a simple case, in that the formatting element is the b element, and there is no furthest block . 这是一个简单的例子，格式化元素是b元素，并且没有最远的块 。 Thus, the stack of open elements ends up with just three elements: html , body , and p , while the list of active formatting elements has just one: i . 因此，打开元素的堆栈最终只有三个元素： html ， body和p ，而活动格式元素列表只有一个： i 。 The DOM tree is unmodified at this point. 此时DOM树未经修改。

The next token is a character ("4"), triggers the reconstruction of the active formatting elements, in this case just the i element. 下一个标记是一个字符（“4”），触发活动格式化元素的重建，在这种情况下只是i元素。 A new i element is thus created for the "4" Text node. 因此为“4”Text节点创建了新的i元素。 After the end tag token for the " i " is also received, and the "5" Text node is inserted, the DOM looks as follows: 在收到“ i ”的结束标记标记并且插入“5”Text节点之后， DOM如下所示：

─html
 ├──head
 └──body
    └──p
       ├──"1"
       ├──b
       │  ├──"2"
       │  └──i
       │     └──"3"
       ├──i
       │  └──"4"
       └──"5"

未公开/错误的HTML标记扩展到其父级之外

问题描述

1 个解决方案

解决方案1
7 已采纳 2016-09-30 02:57:21

Explanation (from linked spec): 解释（来自链接规范）：

未公开/错误的HTML标记扩展到其父级之外

问题描述

1 个解决方案

解决方案1 7 已采纳 2016-09-30 02:57:21

Explanation (from linked spec): 解释（来自链接规范）：

解决方案1
7 已采纳 2016-09-30 02:57:21