简体   繁体   English

如何忽略XML或HTML中未关闭的标签?

[英]How to ignore unclosed tags in XML or HTML?

I'm writing a parser in Haskell for the site using the packages Text.XML and Text.XML.Cursor. 我正在使用包Text.XML和Text.XML.Cursor在Haskell中为站点编写一个解析器。

There are unclosed tags and get an error: 有未关闭的标签,并出现错误:

Main.hs: Error parsing XML file dat.html: 29:1-29:8: Expected end element for: Name {nameLocalName = "br", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "body", nameNamespace = Nothing, namePrefix = Nothing}) Main.hs:解析XML文件dat.html时出错:29:1-29:8:预期的结尾元素:名称{nameLocalName =“ br”,nameNamespace = Nothing,namePrefix = Nothing},但收到:EventEndElement(名称{nameLocalname =“ body”,nameNamespace = Nothing,namePrefix = Nothing})

What to do? 该怎么办? How to ignore such tags? 如何忽略此类标签?

A text object with unclosed tags is not well-formed and is therefore not XML. 与未封闭的标记的文本对象不是结构良好的因此不XML。

So, forget about using any XML libraries, parsers, or tools. 因此,无需使用任何XML库,解析器或工具。 They are, by definition and design, not able to help you. 根据定义和设计,它们无法为您提供帮助。

You have two options. 您有两个选择。 Either, 要么,

  1. Repair the textual object to be well-formed by closing the unclosed tags. 通过关闭未关闭的标签来修复文本对象,使其格式正确。 You might do this manually or try using TIDY , or 您可以手动执行此操作,或尝试使用TIDY ,或者
  2. Define a new data format that allows unclosed tags, and write a parser from the ground up for it. 定义一种允许未封闭标签的新数据格式,并为其重新编写解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM