简体   繁体   中英

Xpath select attribute of current node?

I use python with lxml to process the xml. After I query/filter to get to a nodes I want but I have some problem. How to get its attribute's value by xpath ? Here is my input example.

>print(etree.tostring(node, pretty_print=True ))
<rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>

The value I want is in resource=... . Currently I just use the lxml to get the value. I wonder if it is possible to do in pure xpath ? thanks

EDIT: Forgot to said, this is not a root nodes so I can't use // here. I have like 2000-3000 others in xml file. My first attempt was playing around with ".@attrib" and "self::*@" but those does not seems to work.

EDIT2: I will try my best to explain (well, this is my first time to deal with xml problem using xpath. and english is not one of my favorite field....). Here is my input snippet http://pastebin.com/kZmVdbQQ (full one from here http://www.comp-sys-bio.org/yeastnet/ using version 4).

In my code, I try to get speciesTypes node with resource link chebi ( <rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>) . and then I tried to get value from rdf:resource attribute in rdf:li. The thing is, I am pretty sure it would be easy to get attribute in child node if I start from parent node like speciesTypes, but I wonder how to do if I start from rdf:li. From my understanding, the "//" in xpath will looking for node from everywhere not just only in the current node.

below is my code

import lxml.etree as etree

tree = etree.parse("yeast_4.02.xml")
root = tree.getroot()
ns = {"sbml": "http://www.sbml.org/sbml/level2/version4", 
      "rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
      "body":"http://www.w3.org/1999/xhtml",
      "re": "http://exslt.org/regular-expressions"
      }
#good enough for now
maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns)

def extract_name_and_chebi(node):
    name = node.attrib['name']
    chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource
    assert len(chebies) == 1
    #my current solution to get rdf:resource value from rdf:li node
    rdfNS = "{" + ns.get('rdf') + "}"
    chebi = chebies[0].attrib[rdfNS + 'resource'] 
    #do protein later
    return (name, chebi)

    metaWithChebi = map(extract_name_and_chebi, maybemeta)
fo = open("metabolites.txt", "w")

for name, chebi in metaWithChebi:
    fo.write("{0}\t{1}\n".format(name, chebi))

Prefix the attribute name with @ in the XPath query:

>>> from lxml import etree
>>> xml = """\
... <?xml version="1.0" encoding="utf8"?>
... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
...     <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
... </rdf:RDF>
... """
>>> tree = etree.fromstring(xml)
>>> ns = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}
>>> tree.xpath('//rdf:li/@rdf:resource', namespaces=ns)
['urn:miriam:obo.chebi:CHEBI%3A37671']

EDIT

Here's a revised version of the script in the question:

import lxml.etree as etree

ns = {
    'sbml': 'http://www.sbml.org/sbml/level2/version4',
    'rdf':'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'body':'http://www.w3.org/1999/xhtml',
    're': 'http://exslt.org/regular-expressions',
    }

def extract_name_and_chebi(node):
    chebies = node.xpath("""
        .//rdf:li[
        starts-with(@rdf:resource, 'urn:miriam:obo.chebi')
        ]/@rdf:resource
        """, namespaces=ns)
    return node.attrib['name'], chebies[0]

with open('yeast_4.02.xml') as xml:
    tree = etree.parse(xml)

    maybemeta = tree.xpath("""
        //sbml:speciesType[descendant::rdf:li[
        starts-with(@rdf:resource, 'urn:miriam:obo.chebi')]]
        """, namespaces = ns)

    with open('metabolites.txt', 'w') as output:
        for node in maybemeta:
            output.write('%s\t%s\n' % extract_name_and_chebi(node))

To select off the current node its attribute named rdf:resource , use this XPath expression :

@rdf:resource

In order for this to "work correctly" you must register the association of the prefix "rdf:" to the corresponding namespace.

If you don't know how to register the rdf namespace, it is still possible to select the attribute -- with this XPath expression:

@*[name()='rdf:resource']

Well, I got it. The xpath expression I need here is "./@rdf:resource" not ".@rdf:resource". But why ? I thought "./" indicate the child of current node.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM