简体   繁体   中英

Retrieve XML parent and child attributes using Python and lxml

I'm trying to process an XML file using XPATH in Python / lxml.

I can pull out the values at a particular level of the tree using this code:

file_name = input('Enter the file name, including .xml extension: ') # User inputs file name
print('Parsing ' + file_name)

from lxml import etree

parser = etree.XMLParser()
tree = etree.parse(file_name, parser)

r = tree.xpath('/dataimport/programmelist/programme')
print (len(r))

with open(file_name+'.log', 'w', encoding='utf-8') as f:        
   for r in tree.xpath('/dataimport/programmelist/programme'):
        progid = (r.get("id"))
        print (progid)

It returns a list of values as expected. I also want to return the value of a 'child' (where it exists), but I can't work out how (I can only get it to work as a separate list, but I need to maintain the link between them).

Note: I will be writing the values out to a log file, but since I haven't been successful in getting everything out that I want, I haven't added the 'write out' code yet.

This is the structure of the XML:

<dataimport dtdversion="1.1">
   <programmelist>
      <programme id="eid-273168">
         <imageref idref="img-1844575"/>

How can I get Python to return the id + idref?

The previous examples I have worked with had namespaces, but this file doesn't.

Since xpath() method returns tree, you can use xpath again to get idref list you want:

for r in tree.xpath('/dataimport/programmelist/programme')
    progid = r.get("id")
    ref_list = r.xpath('imageref/@idref')
    print progid, ref_lis

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM