简体   繁体   中英

Error in escaping XML for a KML file

Some time ago I asked a question trying to figure out why modifying a KML file increased the file size.

After poking around, I've found that the issue had to do with escaping XML. Essentially, the "<", ">", and "&" characters were being replaced with:

 "&lt;", "&gt;", and "&amp;"  

It's not a big deal for smaller files, but the extra characters make a big difference in larger files.

I copied some code from this site to help solve the problem:

import lxml
from lxml import etree
import pykml
from pykml.factory import KML_ElementMaker as KML
from pykml import parser

def unescape(s):
    s = s.replace("&lt;", "<")
    s = s.replace("&gt;", ">")
    ## Ampersands must be last to avoid errors in text replacement
    s = s.replace("&amp;", "&")
    return s

with open("myplaces.kml", "rb") as f:
    doc = parser.parse(f).getroot()
    a = doc.Document.Folder[0].Folder[1]
    for q in GEList:
        x = KML.Folder(KML.name(q))
        a.append(x)
    finished = (etree.tostring(doc, pretty_print = True))
    finished = unescape(finished)

with open("myplaces.kml", "wb") as f:
    f.write(finished)

Now however, I'm running into another error. I compared the file before and after I replaced the <, >, and & characters.

Before:  <description><![CDATA[<img src="fedland_leg_pop_2.jpg" alt="headerimg" width="550" height="77"><br>  
After:  <description><img src="fedland_leg_pop_2.jpg" alt="headerimg" width="550" height="77"><br>

Now it seems to be throwing out "< ![CDATA[", & I can't figure out why.

I had the same issue but then I found this ( https://developers.google.com/kml/documentation/kml_tut#descriptive_html ):

Using the CDATA Element If you want to write standard HTML inside a tag, you can put it inside a CDATA tag. If you don't, the angle brackets need to be written as entity references to prevent Google Earth from parsing the HTML incorrectly (for example, the symbol > is written as > and the symbol < is written as <). This is a standard feature of XML and is not unique to Google Earth.

Consider the difference between HTML markup with CDATA tags and without CDATA. First, here's the with CDATA tags:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document>
    <Placemark>
      <name>CDATA example</name>
      <description>
        <![CDATA[
          <h1>CDATA Tags are useful!</h1>
          <p><font color="red">Text is <i>more readable</i> and 
          <b>easier to write</b> when you can avoid using entity 
          references.</font></p>
        ]]>
      </description>
      <Point>
        <coordinates>102.595626,14.996729</coordinates>
      </Point>
    </Placemark>
  </Document>
</kml>

And here's the without CDATA tags, so that special characters must use entity references:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document>
    <Placemark>
      <name>Entity references example</name>
      <description>
            &lt;h1&gt;Entity references are hard to type!&lt;/h1&gt;
            &lt;p&gt;&lt;font color="green"&gt;Text is 
          &lt;i&gt;more readable&lt;/i&gt; 
          and &lt;b&gt;easier to write&lt;/b&gt; 
          when you can avoid using entity references.&lt;/font&gt;&lt;/p&gt;
      </description>
      <Point>
        <coordinates>102.594411,14.998518</coordinates>
      </Point>
    </Placemark>
  </Document>
</kml>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM