简体   繁体   English

从XML文档生成嵌套列表

[英]Generating nested lists from XML doc

Working in python, my goal is to parse through an XML doc I made and create a nested list of lists in order to access them later and parse the feeds. 在python中工作,我的目标是解析我制作的XML文档并创建嵌套的列表列表,以便以后访问它们并解析提要。 The XML doc resembles the following snippet: XML文档类似于以下代码段:

<?xml version="1.0'>
<sources>
    <!--Source List by Institution-->
    <sourceList source="cbc">
        <f>http://rss.cbc.ca/lineup/topstories.xml</f>
    </sourceList>
    <sourceList source="bbc">
        <f>http://feeds.bbci.co.uk/news/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
    </sourceList>
    <sourceList source="reuters">
        <f>http://feeds.reuters.com/reuters/topNews</f>
        <f>http://feeds.reuters.com/news/artsculture</f>
    </sourceList>
</sources>

I would like to have something like nested lists where the inner most list would be the content between the <f></f> tags and the list above that one would be created with the names of the sources ex. 我想使用类似嵌套列表的方法,其中最里面的列表是<f></f>标记之间的内容,而上方的列表将使用源ex的名称创建。 source="reuters" would be reuters. source="reuters"将是路透社。 Retrieving the info from the XML doc isn't a problem and I'm doing it with elementtree with loops retrieving with node.get('source') etc. The problem is I'm having trouble generating the lists with the desired names and different lengths required from the different sources. 从XML文档中检索信息不是问题,我正在使用elementtree并使用node.get('source')等检索循环。问题是我在生成具有所需名称和名称的列表时遇到问题不同来源要求的长度不同。 I have tried appending but am unsure how to append to list with the names retrieved. 我尝试附加,但是不确定如何使用检索到的名称附加到列表。 Would a dictionary be better? 字典会更好吗? What would be the best practice in this situation? 在这种情况下,最佳做法是什么? And how might I make this work? 我该如何做呢? If any more info is required just post a comment and I'll be sure to add it. 如果需要更多信息,请发表评论,我将确保添加它。

From your description, a dictionary with keys according to the source name and values according to the feed lists might do the trick. 根据您的描述,可以根据需要在字典中根据源名称使用键,并根据Feed列表使用值。

Here is one way to construct such a beast: 这是构造这种野兽的一种方法:

from lxml import etree
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
    for source in etree.parse('x.xml').xpath('/sources/sourceList')}

pprint(news_sources)

Another sample, without lxml or xpath : 没有lxmlxpath另一个示例:

import xml.etree.ElementTree as ET
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source]
    for source in ET.parse('x.xml').getroot()}

pprint(news_sources)

Finally, if you are allergic to list comprehensions: 最后,如果您对列出的内容过敏:

import xml.etree.ElementTree as ET
from pprint import pprint

xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
    sourceListName = sourceList.attrib['source']
    news_sources[sourceListName] = []
    for feed in sourceList:
       feedName = feed.text
       news_sources[sourceListName].append(feedName)

pprint(news_sources)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM