Python: ignoring namespaces in xml.etree.ElementTree?

Question

How can I tell ElementTree to ignore namespaces in an XML file?

For example, I would prefer to query modelVersion (as in statement 1) rather than {http://maven.apache.org/POM/4.0.0}modelVersion (as in statement 2).

pom="""
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
</project>
"""

from xml.etree import ElementTree
ElementTree.register_namespace("","http://maven.apache.org/POM/4.0.0")
root = ElementTree.fromstring(pom)

print 1,root.findall('modelVersion')
print 2,root.findall('{http://maven.apache.org/POM/4.0.0}modelVersion')

1 []
2 [<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x1006bff10>]

Answer 1

There appears to be no straight-forward pathway, thus I'd simply wrap the find calls, eg

from xml.etree import ElementTree as ET

POM = """
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xmlns="http://maven.apache.org/POM/4.0.0">
    <modelVersion>4.0.0</modelVersion>
</project>
"""

NSPS = {'foo' : "http://maven.apache.org/POM/4.0.0"}

# sic!
def findall(node, tag):
    return node.findall('foo:' + tag, NSPS) 

root = ET.fromstring(POM)
print(map(ET.tostring, findall(root, 'modelVersion')))

output:

['<ns0:modelVersion xmlns:ns0="http://maven.apache.org/POM/4.0.0">4.0.0</ns0:modelVersion>\n']

Answer 2

Here's what I'm presently doing, which makes me incredibly confident that there's a better way.

$ cat pom.xml |
   tr '\n' ' ' |
   sed 's/<project [^>]*>/<project>/' |
   myprogram |
   sed 's/<project>/<project xmlns="http:\/\/maven.apache.org\/POM\/4.0.0" xmlns:xsi="http:\/\/www.w3.org\/2001\/XMLSchema-instance" xsi:schemaLocation="http:\/\/maven.apache.org\/POM\/4.0.0 http:\/\/maven.apache.org\/maven-v4_0_0.xsd">/'

Answer 3

而不是忽略，另一种方法是删除树中的命名空间，因此不需要“忽略”因为它们不存在 - 请参阅nonagon对此问题的答案（以及我的扩展名以包括属性上的命名空间）： Python ElementTree模块：当使用“find”，“findall”方法时，如何忽略XML文件的命名空间以找到匹配的元素

Answer 4

Here's the equivalent solution without using the shell. Basic idea:

translate <project junk...> to <project>
perform "clean" processing without worrying about the namespace
translate <project> back to <project junk...>

with the new code:

pom="""
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
</project>
"""
short_project="""<project>"""
long_project="""<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">"""

import re,sys
from xml.etree import ElementTree

# eliminate namespace specs
pom=re.compile('<project [^>]*>').sub(short_project,pom)

root = ElementTree.fromstring(pom)
ElementTree.dump(root)
print 1,root.findall('modelVersion')
print 2,root.findall('{http://maven.apache.org/POM/4.0.0}modelVersion')
mv=root.findall('modelVersion')

# restore the namespace specs
pom=ElementTree.tostring(root)
pom=re.compile(short_project).sub(long_project,pom)

Python: ignoring namespaces in xml.etree.ElementTree?

Question

4 answers

solution1
0 2015-12-04 07:56:29

solution2
0 2015-12-04 07:57:30

solution3
0 2015-12-04 08:41:23

solution4
0 2015-12-04 16:28:57

Python: ignoring namespaces in xml.etree.ElementTree?

Question

4 answers

solution1 0 2015-12-04 07:56:29

solution2 0 2015-12-04 07:57:30

solution3 0 2015-12-04 08:41:23

solution4 0 2015-12-04 16:28:57

solution1
0 2015-12-04 07:56:29

solution2
0 2015-12-04 07:57:30

solution3
0 2015-12-04 08:41:23

solution4
0 2015-12-04 16:28:57