简体   繁体   English

Python:使用lxml + objectify + findall或fromstring获取特定的节点值和属性

[英]Python: Get specific node values and attributes using lxml + objectify + findall or fromstring

I took out and cut a portion of an XML source from NVD and below is the snippet: 我从NVD中取出并剪切了一部分XML源,下面是代码片段:

<?xml version='1.0' encoding='UTF-8'?>
<nvd xmlns="http://nvd.nist.gov/feeds/cve/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nvd.nist.gov/feeds/cve/1.2 http://nvd.nist.gov/schema/nvdcve.xsd" pub_date="2014-07-01" nvd_xml_version="1.2">
   <entry CVSS_base_score="6.4" CVSS_exploit_subscore="10.0" CVSS_impact_subscore="4.9" CVSS_score="6.4" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:P/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2011-1381" published="2014-06-27" seq="2011-1381" severity="Medium" type="CVE">
      <desc>
        <descript source="cve">Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.</descript>
      </desc>
   </entry>
   <entry CVSS_base_score="3.5" CVSS_exploit_subscore="6.8" CVSS_impact_subscore="2.9" CVSS_score="3.5" CVSS_vector="(AV:N/AC:M/Au:S/C:P/I:N/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2014-4669" published="2014-06-28" seq="2014-4669" severity="Low" type="CVE">
      <desc>
        <descript source="cve">HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.</descript>
      </desc>
   </entry>
</nvd>

As mentioned on the title of this question and for the related snippet above, I just want to get the value and the attrib of the 'descript' node . 正如该问题的标题以及上面的相关代码片段所述, 我只想获取'descript'节点的值和属性 I tried using the findall method but it's returning an empty list: 我尝试使用findall方法,但它返回一个空列表:

root = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
root.findall('entry')

This returns: 返回:

[]

When I print the tag of the root, here's what it returns: 当我打印根标签时,它返回的内容是:

'{http://nvd.nist.gov/feeds/cve/1.2}nvd'

I also tried printing the tags of the immediate parent and its children: 我还尝试打印直接父级及其子级的标签:

for e in root.iterchildren():
print "Immediate parent : %s" % e.tag
children = e.getchildren()
for c in children : print "\t\tchildren : %s" % c.tag

Here's what it returns: 这是返回的内容:

Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc
Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc

Again, what I just want is to get the attrib and value of the 'descript' node. 再次,我只想要获取“ descript”节点的属性和值。 Any ideas are greatly appreciated. 任何想法都将不胜感激。 Thanks in advance! 提前致谢!

You need to add namespace prefixes in the xpath expression: 您需要在xpath表达式中添加名称空间前缀:

tree = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
for descript in tree.xpath('//ns:entry/ns:desc/ns:descript', namespaces={'ns': 'http://nvd.nist.gov/feeds/cve/1.2'}):
    print descript.text
    print descript.attrib.get('source')

Prints: 打印:

Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.
cve
HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.
cve

Also see this relevant thread: 另请参阅以下相关线程:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM