简体   繁体   English

如何使用python获取XML中的所有标签?

[英]How to get all the tags in an XML using python?

I have been researching in the Python Docs for a way to get the tag names from an XML file, but I haven't been very successful. 我一直在研究Python Docs中从XML文件中获取标记名称的方法,但是我并不是很成功。 Using the XML file below, one can get the country name tags, and all its associated child tags. 使用下面的XML文件,您可以获取国家/地区名称标签及其所有关联的子标签。 Does anyone know how this is done? 有谁知道这是怎么做的?

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Consider using element tree's iterparse() and build nested lists of tag and text pairs. 考虑使用元素树的iterparse()并构建标记和文本对的嵌套列表。 Conditional if logic is used to group country items together and leave out elements with no text, then replace() is used to clean out the line breaks and multiple white spaces that iterparse() picks up: if有条件, if逻辑将国家/地区项目组合在一起,并保留没有文本的元素,则使用replace()清除换行符和iterparse()拾取的多个空白:

import xml.etree.ElementTree as et

data = []
for (ev, el) in et.iterparse(path):
    inner = []

    if el.tag == 'country':        
        for name, value in el.items():
            inner.append([el.tag+'-'+name, str(value).replace('\n','').replace(' ','')])
        for i in el:
            if str(i.text) != 'None':
                inner.append([i.tag, str(i.text).replace('\n','').replace(' ','')])

            for name, value in i.items():
                inner.append([i.tag+'-'+name, str(value).replace('\n','').replace(' ','')])
        data.append(inner)

print(data)
# [[['country-name', 'Liechtenstein'], ['rank', '1'], ['year', '2008'], ['gdppc', '141100'], 
#   ['neighbor-name', 'Austria'], ['neighbor-direction', 'E'], 
#   ['neighbor-name', 'Switzerland'], ['neighbor-direction', 'W']]
#  [['country-name', 'Singapore'], ['rank', '4'], ['year', '2011'], ['gdppc', '59900'], 
#   ['neighbor-name', 'Malaysia'], ['neighbor-direction', 'N']]
#  [['country-name', 'Panama'], ['rank', '68'], ['year', '2011'], ['gdppc', '13600'], 
#   ['neighbor-name', 'CostaRica'], ['neighbor-direction', 'W'], 
#   ['neighbor-name', 'Colombia'], ['neighbor-direction', 'E']]]

查看Python的内置XML功能,以递归方式遍历文档,并将所有标签收集到一个集合中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM