简体   繁体   中英

XML Namespace handling in Python 3 with ElementTree

I have an input XML coming with some wrong namespaces. I tried to fix them with ElementTree but without success

Example input: (here ns0: can be ns:, p:, n: etc etc... )

<ns0:Invoice xmlns:ns0="http://invoices.com/docs/xsd/invoices/v1.2" version="FPR12">

  <InvoiceHeader>
    <DataH>data header</DataH>
  </InvoiceHeader>

  <InvoiceBody>
    <DataB>data body</DataB>
  </InvoiceBody>

</ns0:Invoice>

Output file needed: (namespace in the root must be without prefix and some inner tags declared as xmlns="")

<Invoice xmlns:"http://invoices.com/docs/xsd/invoices/v1.2" version="FPR12">

  <InvoiceHeader xmlns="">
    <DataH>data header</DataH>
  </InvoiceHeader>

  <InvoiceBody xmlns="">
    <DataB>data body</DataB>
  </InvoiceBody>

</Invoice>

I tried to change root namespace as below, but the resulting file is unchanged

import xml.etree.ElementTree as ET

tree = ET.parse('./cache/test.xml')
root = tree.getroot()

root.tag = '{http://invoices.com/docs/xsd/invoices/v1.2}Invoice'
xml = ET.tostring(root, encoding="unicode")
with open('./cache/output.xml', 'wt') as f:
    f.write(xml)

Instead when trying with

changing root.tag  = 'Invoice'

it produces a tag without namespace at all

Please let me know whether I'm making any mistake or I should switch to another library or try with a string replace with regex

Thanks in advance

Don't now if it can be useful to anyone but I managed to fix namespaces using lxml and the following code.

from lxml import etree
from copy import deepcopy

tree = etree.parse('./cache/test.xml')

# create a new root without prefix in the namespace
NSMAP = {None : "http://invoices.com/docs/xsd/invoices/v1.2"}
root = etree.Element("{http://invoices.com/docs/xsd/invoices/v1.2}Invoice", nsmap = NSMAP)

# copy attributes from original root
for attr, value in tree.getroot().items():
    root.set(attr,value)

# deep copy of children (adding empty namespace in some tags)
for child in tree.getroot().getchildren():
    if child.tag in( 'InvoiceHeader', 'InvoiceBody'):
        child.set("xmlns","")
    root.append( deepcopy(child) )

xml = etree.tostring(root, pretty_print=True)
with open('./cache/output.xml', 'wb') as f:
    f.write(xml)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM