I am attempting to demonstrate functionality for finding/replacing CDATA text string content within an XML, similar to the objective posed in a related question ( Find and Replace CDATA Attribute Values in XML - Python ). I am attempting to replace the string "Building in Éclépens, Switzerland" with a new string called "New Building" within a CDATA section of an XML, but I cannot seem to reference the first string correctly. Ideally, I want to be able to find/replace this string via indexing and not by having to hard-code the string name as a variable. The CDATA expression itself is correct and supports annotations, but I cannot even show how to reference this CDATA string even with a simple print statement. Below is the XML, along with the script I am using and the new string to be added to the desired output XML:
The XML ("foo_bar_CDATA.xml"):
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<p><span style="font-weight:italic">Dataset:</span>
Building in Éclépens, Switzerland
</p>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
The script ("foo_bar_CDATA.xml"):
import lxml.etree as ET
xml = ET.parse("C:\\Users\\mdl518\\Desktop\\bar_foo_CDATA.xml")
tree=xml.getroot()
cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element
val_1 = 'New Building' # new string to be included in the XML
# Find and replace the CDATA string with "val_1"
for elem in tree.getiterator():
if elem.text:
elem.text=elem.text.replace('Building in Éclépens, Switzerland ',val_1)
output = ET.tostring(tree,
encoding="UTF-8",
method="xml",
xml_declaration=True,
pretty_print=True)
print(output.decode("utf-8"))
The Desired Output XML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<p><span style="font-weight:italic">Dataset:</span>
New Building
</p>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
When I run the script above, I do not get the desired change to the string of interest and the open/close tags are not preserved (showing as < and >) in the printable view of the XML. I feel the correct solution may only required a couple minor tweaks, any assistance is most appreciated!
You have elem.text=elem.text.replace('Building in Éclépens, Switzerland ',val_1)
Instead use this elem.text=elem.text.replace('Building in Éclépens, Switzerland',val_1)
. I have removed space.
import lxml.etree as ET
xml = ET.parse("/home/cam/out.xml")
tree=xml.getroot()
cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
#print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element
val_1 = 'New Building' # new string to be included in the XML
# Find and replace the CDATA string with "val_1"
for elem in tree.iter():
if "description" in elem.tag:
elem.text=elem.text.replace('Building in Éclépens, Switzerland',val_1)
elem.text = '![CDATA[' + elem.text + ']]'
root_str = ET.tostring(tree)
root_str = str(root_str.decode('utf-8').replace('<', '<').replace('>', '>').replace('\\n', ''))
print(root_str)
Output:
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<p><span style="font-weight:italic">Dataset:</span>
New Building
</p>
</div>
</div>
</body>
</head>
</html>
]]</description>
</Overlay></kml>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.