简体   繁体   English

使用lxml解析具有多个名称空间的xml

[英]Using lxml to parse xml with multiple namespaces

I'm pulling xml from a SOAP api that looks like this: 我从SOAP api中提取xml,如下所示:

<SOAP-ENV:Envelope xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ae="urn:sbmappservices72" xmlns:c14n="http://www.w3.org/2001/10/xml-exc-c14n#" xmlns:diag="urn:SerenaDiagnostics" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Header/>
<SOAP-ENV:Body>
    <ae:GetItemsByQueryResponse>
      <ae:return>
        <ae:item>
          <ae:id xsi:type="ae:ItemIdentifier">
            <ae:displayName/>
            <ae:id>10</ae:id>
            <ae:uuid>a9b91034-8f4d-4043-b9b6-517ba4ed3a33</ae:uuid>
            <ae:tableId>1541</ae:tableId>
            <ae:tableIdItemId>1541:10</ae:tableIdItemId>
            <ae:issueId/>
          </ae:id>

I can't for the life of me use findall to pull something like tableId. 我不能为我的生活使用findall来拉像tableId。 Most of the tutorials on parsing using lxml don't include namespaces, but the one at lxml.de does, and I've been trying to follow it. 大多数使用lxml进行解析的教程都没有包含名称空间,但是lxml.de中的名称空间有,我一直在尝试遵循它。

According to their tutorial you should create a dictionary of the namespaces, which I've done like so: 根据他们的教程,您应该创建一个名称空间的字典,我已经这样做了:

r = tree.xpath('/e:SOAP-ENV/s:ae', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

But that appears to not be working, as when I try to get the len of r, it comes back as 0: 但这似乎不起作用,因为当我试图获得r的len时,它返回0:

print 'length: ' + str(len(r)) #<---- always equals 0

Since the URI for the second namespace is a "urn:", I tried using a real URL to the wsdl as well, but that gives me the same result. 由于第二个命名空间的URI是“urn:”,我也尝试使用wsdl的真实URL,但这给了我相同的结果。

Is there something obvious that I'm missing? 有什么明显的东西让我失踪吗? I just need to be able to pull values like the one for tableIdItemId. 我只需要能够像tableIdItemId那样提取值。

Any help would be greatly appreciated. 任何帮助将不胜感激。

Your XPath doesn't correctly corresponds to the XML structure. 您的XPath未正确对应XML结构。 Try this way instead : 请尝试这种方式:

r = tree.xpath('/e:Envelope/e:Body/s:GetItemsByQueryResponse/s:return/s:item/s:id/s:tableId', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

For small XML, you may want to use // instead of / to simplify the expression, for example : 对于小型XML,您可能希望使用//代替/来简化表达式,例如:

r = tree.xpath('/e:Envelope/e:Body//s:tableId', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

/e:Body//s:tableId will find tableId no matter how depth it is nested within Body . /e:Body//s:tableId将找到tableId无论它在Body嵌套的深度如何。 Note however that // surely slower than / especially when applied for a huge XML. 但请注意//肯定比/特别慢,尤其是在应用于大型XML时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM