繁体   English   中英

python xml解析cdata

[英]python xml parse cdata

我尝试从外汇日历中抓取新闻数据,但是xml文件有一个小问题

def get_news_calendar():
    r = requests.get('http://www.forexfactory.com/ffcal_week_this.xml')
    soup = BeautifulSoup(r.text , 'lxml')
    events = soup.find_all('event')
    for event in events:
        print event.find('title').text, event.find('country').text, event.find('date'), event.find('time').text, event.find('impact').text, event.find('forecast').text, event.find('previous').text

输出:

Current Account EUR <date></date>    
Retail Sales m/m GBP <date></date>    
MPC Member Saunders Speaks GBP <date></date>    
Core CPI m/m CAD <date></date>    
CPI m/m CAD <date></date>    
Trimmed CPI y/y CAD <date></date>    
Median CPI y/y CAD <date></date>    
Common CPI y/y CAD <date></date>    
FOMC Member Kashkari Speaks USD <date></date>    
Flash Manufacturing PMI USD <date></date>    
Flash Services PMI USD <date></date>    
Existing Home Sales USD <date></date>    
IMF Meetings ALL <date></date>    
IMF Meetings ALL <date></date>    
Treasury Sec Mnuchin Speaks USD <date></date>    
French Presidential Election EUR <date></date>

示例xml文件:

<event>
    <title>German Flash Manufacturing PMI</title>
    <country>EUR</country>
    <date><![CDATA[04-21-2017]]></date>
    <time><![CDATA[7:30am]]></time>
    <impact><![CDATA[Medium]]></impact>
    <forecast><![CDATA[58.1]]></forecast>
    <previous><![CDATA[58.3]]></previous>
</event> 

我如何打印cdata的值?

您似乎弄错了解析器的名称。 您正在解析XML文档,因此需要使用lxml-xml而不是lxml

尝试更换

soup = BeautifulSoup(r.text , 'lxml')

soup = BeautifulSoup(r.text , 'lxml-xml')

在对您的get_news_calendar函数进行了更改之后,我得到了在示例XML文件上运行它的以下输出:

German Flash Manufacturing PMI EUR <date>04-21-2017</date> 7:30am Medium 58.1 58.3

考虑直接使用lxml并在所有<event>节点上运行xpath ,因为.text()可以检索CData内容。

import requests
import lxml.etree as et

def get_news_calendar():        
    r = requests.get('http://www.forexfactory.com/ffcal_week_this.xml')
    data = et.fromstring(r.text.encode("utf-8"))

    events = data.xpath('//event')
    for event in events:
        print(event.find('title').text, event.find('country').text,
              event.find('date').text, event.find('time').text, 
              event.find('impact').text, event.find('forecast').text, 
              event.find('previous').text)

get_news_calendar()

# Bank Holiday NZD 04-16-2017 9:00pm Holiday None None
# Bank Holiday AUD 04-16-2017 10:00pm Holiday None None
# GDP q/y CNY 04-17-2017 2:00am High 6.8% 6.8%
# Industrial Production y/y CNY 04-17-2017 2:00am High 6.2% 6.3%
# Fixed Asset Investment ytd/y CNY 04-17-2017 2:00am Medium 8.8% 8.9%
# NBS Press Conference CNY 04-17-2017 2:00am Medium None None
# Retail Sales y/y CNY 04-17-2017 2:00am Low 9.7% 9.5%
# Bank Holiday CHF 04-17-2017 6:00am Holiday None None
# BOJ Gov Kuroda Speaks JPY 04-17-2017 6:15am High None None
# Bank Holiday GBP 04-17-2017 7:00am Holiday None None
# French Bank Holiday EUR 04-17-2017 7:00am Holiday None None
# ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM