简体   繁体   中英

Append new elements to XML

I have base XML, to which I would like to add new elements. But it fails and I cannon understand why.

My base XML:

<?xml version="1.0" encoding="utf-8"?>
<vehicleDefinitions>
    <vehicleType id="bus">
        <capacity>
            <seats persons="3"/>
            <standingRoom persons="9"/>
        </capacity>
        <length meter="12.3"/>
        <width meter="2.5"/>
        <accessTime secondsPerPerson="0.5"/>
        <egressTime secondsPerPerson="0.5"/>
        <doorOperation mode="serial"/>
        <passengerCarEquivalents pce="0.28"/>
    </vehicleType>
</vehicleDefinitions>

My code:

from lxml import etree

schedule = etree.parse('schedule_mapped.xml') #I use this file to get data from it
vehicles = etree.parse('vehicles.xml') #I'm reading my base XML
vehicles_root = vehicles.getroot() #Getting its root
for transitLine in schedule.findall('transitLine'):
    tstype = transitLine.find('transitRoute').find('transportMode').text
    for transitRoute in transitLine.findall('transitRoute'):
        for departure in transitRoute.find('departures').findall('departure'):
            tsname = departure.get('vehicleRefId')
            vehicle = etree.SubElement(vehicles_root, 'vehicle') #I want to add a child to my root element
            vehicle.attrib['id'] = tsname
            vehicle.attrib['type'] = tstype

The structure of my output XML is correct. I mean that children are added:

当前结构

But after writing XML to file

with open(ts.replace('schedule', 'vehicles'), 'wb') as f:
        f.write(etree.tostring(vehicles,pretty_print=True,encoding='utf8'))

I got this

I discovered that the problem might be in unreadable characters from the base XML but I do not know how to cope this.

Consider also XSLT , the special-purpose language designed to transform XML files, which can retrieve nodes from a different XML file using document() function. Additionally, you have better control of output including indentation and line breaks, headers, etc. Python's lxml can run XSLT 1.0 scripts. Doing so you avoid any application layer nested looping.

XSLT (save as.xsl file, to be used in Python below)

Notice reference to other.xml file. Both XML files are assumed to be in same directory.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="vehicleDefinitions">
    <xsl:copy>
        <xsl:copy-of select="vehicleType"/>
        <xsl:for-each select="document('schedule_mapped.xml')/descendant::departure">
          <vehicle id="{@vehicleRefId}" 
                   type="{../preceding-sibling::transportMode}"/>
        </xsl:for-each>
    </xsl:copy>
  </xsl:template>
    
</xsl:stylesheet>

Python

from lxml import etree

doc = etree.parse('vehicle.xml')
xsl = etree.parse('script.xsl')

transformer = etree.XSLT(xsl)
result = transformer(doc)

with open('Output.xml', 'wb') as f:
    f.write(result)

So, finally I found a solution. We can just parse XML without blank characters. It allows "pretty print" to work correctly.

def getClean(filename):
        parser = etree.XMLParser(remove_blank_text=True)
        cleanTree = etree.parse(filename, parser)
        return cleanTree

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM