简体   繁体   English

在 lxml 中使用第二个命名空间时从元素中提取值

[英]Extract value from element when second namespace is used in lxml

I am able to extract values from elements (using lxml in python 2.7) when one namespace is used.当使用一个命名空间时,我能够从元素中提取值(在 python 2.7 中使用 lxml)。 However I can't figure out how to extract values when a second namespace is used.但是,当使用第二个命名空间时,我无法弄清楚如何提取值。 I want to extract the value within //cc-cpl:MainClosedCaption/Id but I keep getting lxml.etree.XPathEvalError: Invalid expression errors.我想提取//cc-cpl:MainClosedCaption/Id但我不断收到lxml.etree.XPathEvalError: Invalid expression errors。 To be specific, the value I'm trying to exract from my sample xml is urn:uuid:6ca58b51-9116-4131-8652-feaed20dca0d具体来说,我试图从我的示例 xml 中提取的值是urn:uuid:6ca58b51-9116-4131-8652-feaed20dca0d

Here's a snipped of the xml (from a Digital Cinema Package):这是 xml 的片段(来自 Digital Cinema Package):

<?xml version="1.0" encoding="UTF-8"?>
<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#">
    <Reel>
      <Id>urn:uuid:58cf368f-ed30-40d8-9258-dd7572035b69</Id>
        <MainPicture>
          <Id>urn:uuid:afe91f7a-6451-4b9f-be2e-345f9a28da6d</Id>
        </MainPicture>
        <cc-cpl:MainClosedCaption xmlns:cc-cpl="http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#">
          <Id>urn:uuid:6ca58b51-9116-4131-8652-feaed20dca0d</Id>
        </cc-cpl:MainClosedCaption>
    </Reel>
</CompositionPlaylist>

Here is an example of code that works:这是一个有效的代码示例:

from lxml import etree
cpl_parse = etree.parse('filename.xml')
pkl_namespace = cpl_parse.xpath('namespace-uri(.)') 
xmluuid =  cpl_parse.xpath('//ns:MainPicture/ns:Id',namespaces={'ns': pkl_namespace})
for i in xmluuid:
    print i.text

When I try to specify the following xpath instead: //ns:MainClosedCaption/ns:Id - I end up with errors.当我尝试指定以下 xpath 时: //ns:MainClosedCaption/ns:Id - 我最终//ns:MainClosedCaption/ns:Id错误。

When I specify the namespace with: pkl_namespace = 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#"'当我指定命名空间时: pkl_namespace = 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#"'

I receive a lxml.etree.XPathEvalError: Invalid expression error我收到一个lxml.etree.XPathEvalError: Invalid expression错误

I know this is a stupid attempt, but the following produced the same error: '//ns:cc-cpl:MainClosed Caption/ns:cc-cpl:Id'我知道这是一个愚蠢的尝试,但以下产生了相同的错误: '//ns:cc-cpl:MainClosed Caption/ns:cc-cpl:Id'

I tried to include the two namespaces in a dictionary as in this answer: https://stackoverflow.com/a/36227869/2188572 , and while I don't get any errors, I end up with no values extracted.我尝试将这两个命名空间包含在字典中,如以下答案所示: https : //stackoverflow.com/a/36227869/2188572 ,虽然我没有收到任何错误,但最终没有提取任何值。 Here's my dictionary:这是我的字典:

namespaces = {
    'ns': 'http://www.digicine.com/PROTO-ASDCP-CPL-20040511#',
    'ns2': 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#',
}

and my command:和我的命令:

xmluuid =  cpl_parse.xpath('//ns:AssetList/ns2:MainClosedCaption/ns2:Id',namespaces=namespaces)

I found this, Extracting nested namespace from a xml using lxml which is actually the exact same kind of xml that I'm working on, but his request was to get the namespace URL, not the actual values of elements.我发现了这个, 使用 lxml 从 xml 中提取嵌套命名空间,这实际上与我正在处理的 xml 类型完全相同,但他的请求是获取命名空间 URL,而不是元素的实际值。

Edit: Using the method from the previous answer to extract the namespace, I tried the following, but got the same errors:编辑:使用上一个答案中的方法提取命名空间,我尝试了以下操作,但遇到了相同的错误:

from lxml import etree
import sys
filename = sys.argv[1]

cpl_parse = etree.parse(filename)
pkl_namespace = etree.QName(cpl_parse.find('.//{*}MainClosedCaption')).namespace
print pkl_namespace
xmluuid =  cpl_parse.xpath('//ns:cc-cpl:MainClosedCaption/ns:cc-cpl:Id',namespaces={'ns': pkl_namespace})
for i in xmluuid:
    print i.text

and here's the errors in full:这是完整的错误:

Traceback (most recent call last):
  File "sub.py", line 8, in <module>
    xmluuid =  cpl_parse.xpath('//ns:cc-cpl:MainClosedCaption/ns:cc-cpl:Id',namespaces={'ns': pkl_namespace})
  File "lxml.etree.pyx", line 2115, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:57654)
  File "xpath.pxi", line 370, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:146564)
  File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962)
  File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144817)
lxml.etree.XPathEvalError: Invalid expression

The Id element in MainClosedCaption belongs to the 2004 namespace. MainClosedCaption 中的Id元素属于 2004 命名空间。 Only an attribute xmlns="..." can change the default namespace;只有属性xmlns="..."可以更改默认命名空间; attributes of the form xmlns:something="..." only add a namespace which has to be explicitly declared. xmlns:something="..."形式的属性仅添加必须显式声明的命名空间。

Try this:尝试这个:

from lxml import etree
cpl_parse = etree.parse('filename.xml')
xmluuid = cpl_parse.xpath('//proto2007:MainClosedCaption/proto2004:Id', namespaces={
    'proto2004': 'http://www.digicine.com/PROTO-ASDCP-CPL-20040511#',
    'proto2007': 'http://www.digicine.com/PROTO-ASDCP-CC-CPL-20070926#',
})
for i in xmluuid:
    print(i.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM