简体   繁体   English

使用BeatifulSoup从URL提取XML数据并输出到字典

[英]XML data from URL using BeatifulSoup and output to dictionary

here I need to read XML data from URL (exchange rate list), output is dictionary...now I can get only first currency...tried with find_all but without success... Can somebody comment where I need to put for loop to read all values... 在这里,我需要从URL(汇率列表)中读取XML数据,输出是字典...现在我只能获得第一种货币...尝试使用find_all但没有成功...有人可以在需要循环的地方发表评论读取所有值...

import bs4 as bs
import urllib.request

source urllib.request.urlopen('http://www.xxxy.hr/Downloads/PBZteclist.xml').read()
soup = bs.BeautifulSoup(source,'xml')

name = soup.find('Name').text
unit = soup.find('Unit').text
buyratecache = soup.find('BuyRateCache').text
buyrateforeign = soup.find('BuyRateForeign').text
meanrate = soup.find('MeanRate').text
sellrateforeign = soup.find('SellRateForeign').text
sellratecache = soup.find('SellRateCache').text


devize =  {'naziv_valute': '{}'.format(name),
           'jedinica': '{}'.format(unit),
           'kupovni': '{}'.format(buyratecache),
           'kupovni_strani': '{}'.format(buyrateforeign),
           'srednji': '{}'.format(meanrate),
           'prodajni_strani': '{}'.format(sellrateforeign),
           'prodajni': '{}'.format(sellratecache)}

print ("devize:",devize)

Example of XML: XML示例:

<ExchRates>
    <ExchRate>
        <Bank>Privredna banka Zagreb</Bank>
        <CurrencyBase>HRK</CurrencyBase>
        <Date>12.01.2019.</Date>
        <Currency Code="036">
            <Name>AUD</Name>
            <Unit>1</Unit>
            <BuyRateCache>4,485390</BuyRateCache>
            <BuyRateForeign>4,530697</BuyRateForeign>
            <MeanRate>4,646869</MeanRate>
            <SellRateForeign>4,786275</SellRateForeign>
            <SellRateCache>4,834138</SellRateCache>
        </Currency>
        <Currency Code="124">
            <Name>CAD</Name>
            <Unit>1</Unit>
            <BuyRateCache>4,724225</BuyRateCache>
            <BuyRateForeign>4,771944</BuyRateForeign>
            <MeanRate>4,869331</MeanRate>
            <SellRateForeign>4,991064</SellRateForeign>
            <SellRateCache>5,040975</SellRateCache>
        </Currency>
        <Currency Code="203">
            <Name>CZK</Name>
            <Unit>1</Unit>
            <BuyRateCache>0,280057</BuyRateCache>
            <BuyRateForeign>0,284322</BuyRateForeign>
            <MeanRate>0,290124</MeanRate>
            <SellRateForeign>0,297377</SellRateForeign>
            <SellRateCache>0,300351</SellRateCache>
        </Currency>
        ...etc...
    </ExchRate>
</ExchRates>

Simply iterate through all Currency nodes (not the soup object) and even use a list comprehension to build a list of dictionaries: 只需遍历所有Currency节点(而不是soup对象),甚至使用列表推导来构建字典列表:

soup = bs.BeautifulSoup(source, 'xml')

# ALL EXCHANGE RATE NODES
curency_nodes = soup.findAll('Currency')

# LIST OF DICTIONAIRES
devize_list = [{'naziv_valute': c.find('Name').text,
                'jedinica': c.find('Unit').text,
                'kupovni': c.find('BuyRateCache').text,
                'kupovni_strani': c.find('BuyRateForeign').text,
                'srednji': c.find('MeanRate').text,
                'prodajni_strani': c.find('SellRateForeign').text,
                'prodajni': c.find('SellRateCache').text
               } for c in curency_nodes]

Alternatively, incorporate a dictionary comprehension since you are extracting all elements: 另外,由于要提取所有元素,因此可以合并字典理解:

devize_list = [{n.name: n.text} for c in currency_nodes \
                                    for n in c.children if n.name is not None ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM