简体   繁体   English

带有多个名称空间的python lxml findall

[英]python lxml findall with multiple namespaces

I'm trying to parse an XML document with multiple namespaces with lxml, and I'm stuck on getting the findall() method to return something. 我正在尝试使用lxml解析具有多个名称空间的XML文档,而且我一直坚持使用findall()方法返回某些内容。

My XML: 我的XML:

<MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         
                    xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd">
    <HistoryRecords>
        <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId>
            <List>
                <HistoryRecord>
                    <Value>60</Value>
                    <State>Valid</State>
                    <TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
                </HistoryRecord>
            </List>
        </HistoryRecords>
    <HistoryRecords>
</MeasurementRecords>

My code: 我的代码:

from lxml import etree
from pprint import pprint

RSPxmlFile = '/home/user/Desktop/100_0000100004_3788_20160420144011263_records.xml'

with open (RSPxmlFile, 'rt') as f:
    tree = etree.parse(f)

root = tree.getroot()

for node in tree.findall('MeasurementRecords', root.nsmap):
    print node
    print "parameter = ", node.text

Gives: 给出:

ValueError: empty namespace prefix is not supported in ElementPath

Some experiments I've tried after reading this : 一些实验,我看了之后试过这样

>>> root.nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: http://www.company.com/common/rsp/2012/07'}

>>> nsmap['foo']=nsmap[None]
>>> nsmap.pop(None)
'http://www.company.com/common/rsp/2012/07'
>>> nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'foo': 'http://www.company.com/common/rsp/2012/07'}
>>> tree.xpath("//MeasurementRecords", namespaces=nsmap)
[]
>>> tree.xpath('/foo:MeasurementRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>]
>>> tree.xpath('/foo:MeasurementRecords/HistoryRecords', namespaces=nsmap)
[]

But that didn't seem to help. 但这似乎没有帮助。

So, more experiments: 因此,进行了更多实验:

>>> tree.findall('//{http://www.company.com/common/rsp/2012/07}MeasurementRecords')
[]
>>> print root
<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>
>>> print tree
<lxml.etree._ElementTree object at 0x6ffffda5368>
>>> for node in tree.iter():
...     print node
...
<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>
<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x6ffffda5cf8>
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x6ffffda5f38>
...etc...
>>> tree.findall("//HistoryRecords", namespaces=nsmap)
[]
>>> tree.findall("//foo:MeasurementRecords/HistoryRecords", namespaces=nsmap)
[]

I'm stumped. 我很沮丧 I have no idea what's wrong. 我不知道怎么了

If you start with this: 如果从此开始:

>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>> 

This will fail to find any elements... 这将找不到任何元素...

>>> root.findall('{http://www.company.com/common/rsp/2012/07}MeasurementRecords')
[]

...but that's because root is a MeasurementRecords element; ...但是那是因为root MeasurementRecords元素; it does not contain any MeasurementRecords elements. 它不包含任何MeasurementRecords元素。 On the other hand, the following works just fine: 另一方面,以下工作正常:

>>> root.findall('{http://www.company.com/common/rsp/2012/07}HistoryRecords')
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]
>>> 

Using the xpath method, you could do something like this: 使用xpath方法,您可以执行以下操作:

>>> nsmap={'a': 'http://www.company.com/common/rsp/2012/07',
... 'b': 'http://www.w3.org/2001/XMLSchema-instance'}
>>> root.xpath('//a:HistoryRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]

So: 所以:

  • The findall and find methods require {...namespace...}ElementName syntax. findallfind方法需要{...namespace...}ElementName语法。
  • The xpath method requires namespace prefixes ( ns:ElementName ), which it looks up in the provided namespaces map. xpath方法需要名称空间前缀( ns:ElementName ),它会在提供的namespaces映射中查找。 The prefix doesn't have to match the prefix used in the original document, but the namespace url must match. 前缀不必与原始文档中使用的前缀匹配,但是名称空间url必须匹配。

So this works: 所以这工作:

>>> root.find('{http://www.company.com/common/rsp/2012/07}HistoryRecords/{http://www.company.com/common/rsp/2012/07}ValueItemId')
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0332a70>

Or this works: 或这有效:

>>> root.xpath('/a:MeasurementRecords/a:HistoryRecords/a:ValueItemId',namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0330830>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM