简体   繁体   中英

Find and Replace CDATA Text String in XML - Python

I am attempting to demonstrate functionality for finding/replacing CDATA text string content within an XML, similar to the objective posed in a related question ( Find and Replace CDATA Attribute Values in XML - Python ). I am attempting to replace the string "Building in Éclépens, Switzerland" with a new string called "New Building" within a CDATA section of an XML, but I cannot seem to reference the first string correctly. Ideally, I want to be able to find/replace this string via indexing and not by having to hard-code the string name as a variable. The CDATA expression itself is correct and supports annotations, but I cannot even show how to reference this CDATA string even with a simple print statement. Below is the XML, along with the script I am using and the new string to be added to the desired output XML:

The XML ("foo_bar_CDATA.xml"):

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <p><span style="font-weight:italic">Dataset:</span>
                        Building in Éclépens, Switzerland
                    </p>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

The script ("foo_bar_CDATA.xml"):

import lxml.etree as ET
xml = ET.parse("C:\\Users\\mdl518\\Desktop\\bar_foo_CDATA.xml")
tree=xml.getroot()

cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element
val_1 = 'New Building'  # new string to be included in the XML  

# Find and replace the CDATA string with "val_1"
for elem in tree.getiterator():
    if elem.text:
        elem.text=elem.text.replace('Building in Éclépens, Switzerland ',val_1)
    
    output = ET.tostring(tree, 
                 encoding="UTF-8",
                 method="xml", 
                 xml_declaration=True, 
                 pretty_print=True)

    print(output.decode("utf-8"))

The Desired Output XML:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <p><span style="font-weight:italic">Dataset:</span>
                        New Building
                    </p>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

When I run the script above, I do not get the desired change to the string of interest and the open/close tags are not preserved (showing as &lt and &gt) in the printable view of the XML. I feel the correct solution may only required a couple minor tweaks, any assistance is most appreciated!

You have elem.text=elem.text.replace('Building in Éclépens, Switzerland ',val_1)

Instead use this elem.text=elem.text.replace('Building in Éclépens, Switzerland',val_1) . I have removed space.

import lxml.etree as ET
xml = ET.parse("/home/cam/out.xml")
tree=xml.getroot()

cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
#print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element

val_1 = 'New Building'  # new string to be included in the XML  

# Find and replace the CDATA string with "val_1"
for elem in tree.iter():
    if "description" in elem.tag:
        elem.text=elem.text.replace('Building in Éclépens, Switzerland',val_1)
        elem.text = '![CDATA[' + elem.text + ']]'
root_str = ET.tostring(tree)
root_str = str(root_str.decode('utf-8').replace('&lt;', '<').replace('&gt;', '>').replace('\\n', ''))
print(root_str)

Output:

<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>![CDATA[
    
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <p><span style="font-weight:italic">Dataset:</span>
                        New Building
                    </p>
                </div>
            </div>
        </body>
    </head>
    </html>
    
    ]]</description>   
</Overlay></kml>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM