Extract HTML from xml

Question

I want to extract html page from an xml file. Any ideas please ?

 <?xml ....>
      <first>
      </first>

         <second>
         </second>
      <xhtml>
          <html>
              .....some html code here
          </html>
      </xhtml>

I want to extract html page as it is from the above.

Answer 1

because xml and html markup is similar any xml parser might have issues with it. I would suggest when you save the html data in the xml file, you encode it to prevent the xml parser from having issues. Then when you recall the data from the xml you just need to decode it for use.

<?xml ....?
<first></first>
<second></second>
<markup>
    &lt;html&gt;
        code here
    &lt;/html&gt;
</markup>

when you decode the markup section it will look like this

<html>
    code here
</html>

Answer 2

You might find this of some use:

http://www.w3schools.com/xml/xml_parser.asp

You can extract the HTML from the XML using JavaScript. You can then create an element on your HTML page in JavaScript and dump the HTML in there. The only issue with this is that it seems that the XML data you're receiving has a HTML tag.

If you want to add the content to an existing page, then you would have to strip the html and body tags.

Answer 3

If you use python, extraction can be very easy.

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''
 <?xml >
    <first>
    </first>
        <second>
        </second>
    <xhtml>
        <html>
            .....some html code here
        </html>
    </xhtml>
'''
doc = SimplifiedDoc(html)
html = doc.xhtml.html
print (html)

First you need to install simplified_scrapy using pip.

pip install simplified_scrapy

Extract HTML from xml

Question

3 answers

solution1
0 2013-04-15 12:04:42

solution2
0 2013-04-15 12:22:48

solution3
0 2019-12-12 01:00:54

Extract HTML from xml

Question

3 answers

solution1 0 2013-04-15 12:04:42

solution2 0 2013-04-15 12:22:48

solution3 0 2019-12-12 01:00:54

solution1
0 2013-04-15 12:04:42

solution2
0 2013-04-15 12:22:48

solution3
0 2019-12-12 01:00:54