I have base XML, to which I would like to add new elements. But it fails and I cannon understand why.
My base XML:
<?xml version="1.0" encoding="utf-8"?>
<vehicleDefinitions>
<vehicleType id="bus">
<capacity>
<seats persons="3"/>
<standingRoom persons="9"/>
</capacity>
<length meter="12.3"/>
<width meter="2.5"/>
<accessTime secondsPerPerson="0.5"/>
<egressTime secondsPerPerson="0.5"/>
<doorOperation mode="serial"/>
<passengerCarEquivalents pce="0.28"/>
</vehicleType>
</vehicleDefinitions>
My code:
from lxml import etree
schedule = etree.parse('schedule_mapped.xml') #I use this file to get data from it
vehicles = etree.parse('vehicles.xml') #I'm reading my base XML
vehicles_root = vehicles.getroot() #Getting its root
for transitLine in schedule.findall('transitLine'):
tstype = transitLine.find('transitRoute').find('transportMode').text
for transitRoute in transitLine.findall('transitRoute'):
for departure in transitRoute.find('departures').findall('departure'):
tsname = departure.get('vehicleRefId')
vehicle = etree.SubElement(vehicles_root, 'vehicle') #I want to add a child to my root element
vehicle.attrib['id'] = tsname
vehicle.attrib['type'] = tstype
The structure of my output XML is correct. I mean that children are added:
But after writing XML to file
with open(ts.replace('schedule', 'vehicles'), 'wb') as f:
f.write(etree.tostring(vehicles,pretty_print=True,encoding='utf8'))
I discovered that the problem might be in unreadable characters from the base XML but I do not know how to cope this.
Consider also XSLT , the special-purpose language designed to transform XML files, which can retrieve nodes from a different XML file using document()
function. Additionally, you have better control of output including indentation and line breaks, headers, etc. Python's lxml
can run XSLT 1.0 scripts. Doing so you avoid any application layer nested looping.
XSLT (save as.xsl file, to be used in Python below)
Notice reference to other.xml file. Both XML files are assumed to be in same directory.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="vehicleDefinitions">
<xsl:copy>
<xsl:copy-of select="vehicleType"/>
<xsl:for-each select="document('schedule_mapped.xml')/descendant::departure">
<vehicle id="{@vehicleRefId}"
type="{../preceding-sibling::transportMode}"/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python
from lxml import etree
doc = etree.parse('vehicle.xml')
xsl = etree.parse('script.xsl')
transformer = etree.XSLT(xsl)
result = transformer(doc)
with open('Output.xml', 'wb') as f:
f.write(result)
So, finally I found a solution. We can just parse XML without blank characters. It allows "pretty print" to work correctly.
def getClean(filename):
parser = etree.XMLParser(remove_blank_text=True)
cleanTree = etree.parse(filename, parser)
return cleanTree
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.