简体   繁体   中英

Get all values of a particular attribute from xml using python

I need to get all values of a particular attribute. The tag name of that attribute might be different and the attribute might be at any level in the xml tree (root level / child level / etc.). Consider the following xml

<?xml version="1.0" encoding="utf-8"?>
    <college name ="xyz"/>
    <university>
        <college name = "abc1" id = "a"/>
        <college name = "abc2" id = "b"/>
        <sub-univ>
            <sub-univ-col name = "sfd"/>
        </sub-univ>
    </university>
    <school name = "asdf"/>enter code here

How do I get the value of "name" attribute from all the xml tags.? XML file which I has much more levels than the example stated above. Is there any way to get the values without parsing at every level?

Straightforward in any parser that supports XPath . For example, lxml :

doc = lxml.etree.fromstring(that_xml_you_gave)

doc.xpath('//@name')
Out[208]: ['xyz', 'abc1', 'abc2', 'sfd', 'asdf']

If you use Beautiful Soup, this becomes pretty easy:

from bs4 import BeautifulSoup

xml = '''
<?xml version="1.0" encoding="utf-8"?>
    <college name ="xyz"/>
    <university>
        <college name = "abc1" id = "a"/>
        <college name = "abc2" id = "b"/>
        <sub-univ>
            <sub-univ-col name = "sfd"/>
        </sub-univ>
    </university>
    <school name = "asdf"/>
'''

soup = BeautifulSoup(xml)
names = [tag.get('name') for tag in soup.find_all()]
print(names)

Result:

['xyz', None, 'abc1', 'abc2', None, 'sfd', 'asdf']

Note that we use tag.get(...) because some of the tags don't have a name attribute. Alternatively, you could instead do:

names = [tag['name'] for tag in soup.find_all() if tag.has_attr('name')]

With result:

['xyz', 'abc1', 'abc2', 'sfd', 'asdf']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM