如何獲取名稱空間元素的屬性

Question

我正在解析我每天從供應商處收到的XML文檔，它大量使用名稱空間。 我已將問題最小化到最小子集：

我需要解析一些元素，所有元素都是具有特定屬性的元素的子元素。
我能夠使用lxml.etree.Element.findall(TAG, root.nsmap)來查找我需要檢查其屬性的候選節點。

然后我試圖通過我知道它使用的名稱檢查每個Elements的屬性：具體地說這是ss:Name 。 如果該屬性的值是期望值，我將深入研究所述Element （繼續做其他事情）。

我怎樣才能做到這一點？

我正在解析的XML大致是

<FOO xmlns="SOME_REALLY_LONG_STRING"
 some gorp declaring a bunch of namespaces one of which is 
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar" OTHER_ATTRIBS_I_DONT_CARE_ABOUT>
        ....
        <MoreThingsToLookAtLater>
            ....
        </MoreThingsToLookAtLater>
        ....
    </SomethingIWant>
    ...
</FOO>

我找到了第一個我想要SomethingIWant元素（最終我想要它們所以我找到了所有）

import lxml
from lxml import etree

tree = etree.parse(myfilename)
root = tree.getroot()
# i want just the first one for now
my_sheet = root.findall('ss:RecordSet', root.nsmap)[0]

現在我想從這個元素中獲取ss:Name屬性，進行檢查，但我不確定如何？

我知道my_sheet.attrib會顯示原始URI，后跟屬性名稱，但我不希望這樣。 我需要檢查它是否具有特定命名空間屬性的特定值。 （因為如果它錯了，我可以完全從進一步處理中跳過這個元素）。

我嘗試使用lxml.etree.ElementTree.attrib.get()但我似乎沒有獲得任何有用的東西。

有任何想法嗎？

Answer 1

我很確定這是一種可怕的非PYTHONIC非理想方式; 似乎必須有一個更好的方式...但我發現我可以做到這一點：

SS_REAL = "{%s}" % root.nsmap.get('ss')

然后我可以這樣做： my_sheet.get( SS_REAL + "NAME" )

它得到了我想要的東西..但這不可能是正確的方法來做到這一點..

Answer 2

lxml優於標准python XML解析器的優點之一是lxml通過xpath()方法完全支持XPath 1.0規范。 所以我大部分時間都會使用xpath()方法。 您當前案例的工作示例：

from lxml import etree

xml = """<FOO xmlns="SOME_REALLY_LONG_STRING"
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar">
        ....
    </SomethingIWant>
    ...
</FOO>"""

root = etree.fromstring(xml)
ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT'}

# i want just the first one for now
result = root.xpath('//@ss:Name', namespaces=ns)[0]
print(result)

輸出：

bar

更新：

修改示例演示如何從當前element獲取命名空間中的屬性：

ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT', 'd': 'SOME_REALLY_LONG_STRING'}

element = root.xpath('//d:SomethingIWant', namespaces=ns)[0]
print(etree.tostring(element))

attribute = element.xpath('@ss:Name', namespaces=ns)[0]
print(attribute)

輸出：

<SomethingIWant xmlns="SOME_REALLY_LONG_STRING" xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT" ss:Name="bar">
        ....
    </SomethingIWant>
    ...

bar

Answer 3

我的解決方案

https://pastebin.com/F5HAw6zQ

#!/usr/bin/python
# -*- coding: utf-8 -*-

from sys import argv
import xml.etree.ElementTree as ET

NS = 'x' # default namespace key # (any string is OK)

class XMLParser(object):
    def __init__(self):
        self.ns = {}     # namespace dict
        self.root = None # XML's root element

    # extracts the namespace (usually from the root element)
    def get_namespace(self, tag):
        return tag.split('}')[0][1:]

    # loads the XML file (here: from string)
    def load_xml(self, xmlstring):
        root = ET.fromstring(xmlstring)
        self.root = root
        self.ns[NS] = self.get_namespace(root.tag)
        return True

    # transforms XPath without namespaces to XPath with namespace
    # AND detects if last element is an attribute
    def ns_xpath(self, xpath):
        tags = xpath.split('/')
        if tags[-1].startswith('@'):
            attrib = tags.pop()[1:]
        else:
            attrib = None
        nsxpath = '/'.join(['%s:%s' % (NS, tag) for tag in tags])
        return nsxpath, attrib

    # `find` and `findall` method in one place honoring attributes in XPath
    def xfind(self, xpath, e=None, findall=False):
        if not e:
            e = self.root
        if not findall:
            f = e.find
        else:
            f = e.findall
        nsxpath, attrib = self.ns_xpath(xpath)
        e = f(nsxpath, self.ns)
        if attrib:
            return e.get(attrib)
        return e

def main(xmlstring):
    p = XMLParser()
    p.load_xml(xmlstring)
    xpaths = {
        'Element a:': 'a',
        'Element b:': 'a/b',
        'Attribute c:': 'a/b/@c'
        }
    for key, xpath in xpaths.items():
        print key, xpath, p.xfind(xpath)

if __name__ == "__main__":
    xmlstring = """<root xmlns="http://www.example.com">
        <a>
            <b c="Hello, world!">
            </b>
        </a>
    </root>"""
    main(xmlstring)

結果：

Element a: a <Element '{http://www.example.com}a' at 0x2bbcb30>
Element b: a/b <Element '{http://www.example.com}b' at 0x2bbcb70>
Attribute c: a/b/@c Hello, world!

如何獲取名稱空間元素的屬性

問題描述

3 個解決方案

解決方案1
4 2015-06-26 01:55:13

解決方案2
4 已采納 2015-06-26 02:27:36

解決方案3
-1 2018-01-09 14:27:52

如何獲取名稱空間元素的屬性

問題描述

3 個解決方案

解決方案1 4 2015-06-26 01:55:13

解決方案2 4 已采納 2015-06-26 02:27:36

解決方案3 -1 2018-01-09 14:27:52

解決方案1
4 2015-06-26 01:55:13

解決方案2
4 已采納 2015-06-26 02:27:36

解決方案3
-1 2018-01-09 14:27:52