简体   繁体   English

从 xml 中提取 HTML

[英]Extract HTML from xml

I want to extract html page from an xml file.我想从 xml 文件中提取 html 页面。 Any ideas please ?请问有什么想法吗?

 <?xml ....>
      <first>
      </first>

         <second>
         </second>
      <xhtml>
          <html>
              .....some html code here
          </html>
      </xhtml>

I want to extract html page as it is from the above.我想从上面提取html页面。

because xml and html markup is similar any xml parser might have issues with it.因为 xml 和 html 标记是相似的,任何 xml 解析器都可能有问题。 I would suggest when you save the html data in the xml file, you encode it to prevent the xml parser from having issues.我建议您将 html 数据保存在 xml 文件中时,对其进行编码以防止 xml 解析器出现问题。 Then when you recall the data from the xml you just need to decode it for use.然后,当您从 xml 中调用数据时,您只需要对其进行解码即可使用。

<?xml ....?
<first></first>
<second></second>
<markup>
    &lt;html&gt;
        code here
    &lt;/html&gt;
</markup>

when you decode the markup section it will look like this当您解码标记部分时,它将如下所示

<html>
    code here
</html>

You might find this of some use:你可能会发现这有一些用处:

http://www.w3schools.com/xml/xml_parser.asp http://www.w3schools.com/xml/xml_parser.asp

You can extract the HTML from the XML using JavaScript.您可以使用 JavaScript 从 XML 中提取 HTML。 You can then create an element on your HTML page in JavaScript and dump the HTML in there.然后,您可以使用 JavaScript 在 HTML 页面上创建一个元素,并将 HTML 转储到其中。 The only issue with this is that it seems that the XML data you're receiving has a HTML tag.唯一的问题是您收到的 XML 数据似乎有一个 HTML 标记。

If you want to add the content to an existing page, then you would have to strip the html and body tags.如果要将内容添加到现有页面,则必须去除 html 和 body 标签。

If you use python, extraction can be very easy.如果您使用python,提取会非常容易。

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''
 <?xml >
    <first>
    </first>
        <second>
        </second>
    <xhtml>
        <html>
            .....some html code here
        </html>
    </xhtml>
'''
doc = SimplifiedDoc(html)
html = doc.xhtml.html
print (html)

First you need to install simplified_scrapy using pip.首先你需要使用pip安装simplified_scrapy。

pip install simplified_scrapy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM