[英]How to parse XML in Python and LXML?
Here's my project: I'm graphing weather data from WeatherBug using RRDTool. 这是我的项目:我正在使用RRDTool将WeatherBug的天气数据绘制成图表。 I need a simple, efficient way to download the weather data from WeatherBug. 我需要一种简单,有效的方法来从WeatherBug下载天气数据。 I was using a terribly inefficient bash-script-scraper but moved on to BeautifulSoup. 我当时使用效率极低的bash脚本刮板,但后来转到BeautifulSoup。 The performance is just too slow (it's running on a Raspberry Pi) so I need to use LXML. 性能太慢了(它在Raspberry Pi上运行),所以我需要使用LXML。
What I have so far: 到目前为止,我有:
from lxml import etree
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")
But I get an error message. 但是我收到一条错误消息。 Weather.xml is this: Weather.xml是这样的:
<?xml version="1.0" encoding="UTF-8"?>
<aws:weather xmlns:aws="http://www.aws.com/aws">
<aws:api version="2.0"/>
<aws:WebURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&Units=0&stat=TNKCN</aws:WebURL>
<aws:InputLocationURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&Units=0</aws:InputLocationURL>
<aws:ob>
<aws:ob-date>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="10" hour-24="22"/>
<aws:minute number="26"/>
<aws:second number="00"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:ob-date>
<aws:requested-station-id/>
<aws:station-id>TNKCN</aws:station-id>
<aws:station>Tunkhannock HS</aws:station>
<aws:city-state zipcode="18657">Tunkhannock, PA</aws:city-state>
<aws:country>USA</aws:country>
<aws:latitude>41.5663871765137</aws:latitude>
<aws:longitude>-75.9794464111328</aws:longitude>
<aws:site-url>http://www.tasd.net/highschool/index.cfm</aws:site-url>
<aws:aux-temp units="&deg;F">-100</aws:aux-temp>
<aws:aux-temp-rate units="&deg;F">0</aws:aux-temp-rate>
<aws:current-condition icon="http://deskwx.weatherbug.com/images/Forecast/icons/cond013.gif">Cloudy</aws:current-condition>
<aws:dew-point units="&deg;F">40</aws:dew-point>
<aws:elevation units="ft">886</aws:elevation>
<aws:feels-like units="&deg;F">41</aws:feels-like>
<aws:gust-time>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="12" hour-24="12"/>
<aws:minute number="18"/>
<aws:second number="00"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:gust-time>
<aws:gust-direction>NNW</aws:gust-direction>
<aws:gust-direction-degrees>323</aws:gust-direction-degrees>
<aws:gust-speed units="mph">17</aws:gust-speed>
<aws:humidity units="%">98</aws:humidity>
<aws:humidity-high units="%">100</aws:humidity-high>
<aws:humidity-low units="%">61</aws:humidity-low>
<aws:humidity-rate>3</aws:humidity-rate>
<aws:indoor-temp units="&deg;F">77</aws:indoor-temp>
<aws:indoor-temp-rate units="&deg;F">-1.1</aws:indoor-temp-rate>
<aws:light>0</aws:light>
<aws:light-rate>0</aws:light-rate>
<aws:moon-phase moon-phase-img="http://api.wxbug.net/images/moonphase/mphase01.gif">0</aws:moon-phase>
<aws:pressure units=""">30.09</aws:pressure>
<aws:pressure-high units=""">30.5</aws:pressure-high>
<aws:pressure-low units=""">30.08</aws:pressure-low>
<aws:pressure-rate units=""/h">-0.01</aws:pressure-rate>
<aws:rain-month units=""">0.11</aws:rain-month>
<aws:rain-rate units=""/h">0</aws:rain-rate>
<aws:rain-rate-max units=""/h">0.12</aws:rain-rate-max>
<aws:rain-today units=""">0.09</aws:rain-today>
<aws:rain-year units=""">0.11</aws:rain-year>
<aws:temp units="&deg;F">41</aws:temp>
<aws:temp-high units="&deg;F">42</aws:temp-high>
<aws:temp-low units="&deg;F">29</aws:temp-low>
<aws:temp-rate units="&deg;F/h">-0.9</aws:temp-rate>
<aws:sunrise>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="7" hour-24="07"/>
<aws:minute number="29"/>
<aws:second number="53"/>
<aws:am-pm abbrv="AM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:sunrise>
<aws:sunset>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="4" hour-24="16"/>
<aws:minute number="54"/>
<aws:second number="19"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:sunset>
<aws:wet-bulb units="&deg;F">40.802</aws:wet-bulb>
<aws:wind-speed units="mph">3</aws:wind-speed>
<aws:wind-speed-avg units="mph">1</aws:wind-speed-avg>
<aws:wind-direction>S</aws:wind-direction>
<aws:wind-direction-degrees>163</aws:wind-direction-degrees>
<aws:wind-direction-avg>SE</aws:wind-direction-avg>
</aws:ob>
</aws:weather>
I used http://www.xpathtester.com/test to test my xpath and it worked there. 我使用http://www.xpathtester.com/test来测试我的xpath,它在那里工作。 But I get the error message: 但是我收到错误消息:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 2043, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:47570)
File "xpath.pxi", line 376, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:118247)
File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116728)
lxml.etree.XPathEvalError: Undefined namespace prefix
This is all very new to me -- Python, XML, and LXML. 这对我来说是非常新的-Python,XML和LXML。 All I want is the observed time and the temperature. 我想要的只是观察到的时间和温度。
Do my problems have anything to do with that aws: prefix in front of everything? 我的问题和aws:有什么关系吗? What does that even mean? 那有什么意思?
Any help you can offer is greatly appreciated! 您能提供的任何帮助将不胜感激!
The problem has all "to do with that aws: prefix in front of everything"; 问题全在于“与aws:一切前面的前缀有关”。 it is a namespace prefix which you have to define. 它是您必须定义的名称空间前缀。 This is easily achievable, as in: 这很容易实现,例如:
print doc.xpath('//aws:weather/aws:ob/aws:temp',
namespaces={'aws': 'http://www.aws.com/aws'})[0].text
The need for this mapping between the namespace prefix to a value is documented at http://lxml.de/xpathxslt.html . 在http://lxml.de/xpathxslt.html上记录了在名称空间前缀与值之间进行此映射的需要。
Try something like this: 尝试这样的事情:
from lxml import etree
ns = etree.FunctionNamespace("http://www.aws.com/aws")
ns.prefix = "aws"
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")[0].text
See this link: http://lxml.de/extensions.html 看到此链接: http : //lxml.de/extensions.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.