[英]Problem parsing XML document with namespaces using Python lxml
[英]Parsing multiple namespaces XML in python using lxml
<?xml-stylesheet href="/Style Library/st/xslt/rss2.xsl" type="text/xsl" media="screen" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:ta="http://www.smartraveller.gov.au/schema/rss/travel_advisories/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Travel Advisories</title>
<link>http://smartraveller.gov.au/countries/</link>
<description>the Australian Department of Foreign Affairs and Trade's Smartraveller advisory service</description>
<language>en</language>
<webMaster>webmaster@dfat.gov.au</webMaster>
<copyright>Copyright Commonwealth of Australia 2011</copyright>
<ttl>60</ttl>
<atom:link href="http://smartraveller.gov.au/countries/Documents/index.rss" rel="self" type="application/rss+xml" />
<generator>zcms</generator>
<image>
<title>Advice</title>
<link>http://smartraveller.gov.au/countries/</link>
<url>/Style Library/st/images/dfat_logo_small.gif</url>
</image>
<item>
<title>Czech Republic</title>
<description>This travel advice has been reviewed. The level of our advice has not changed. Exercise normal safety precautions in the Czech Republic.</description>
<link>http://smartraveller.gov.au/Countries/europe/eastern/Pages/czech_republic.aspx</link>
<pubDate>26 Oct 2018 05:25:14 GMT</pubDate>
<guid isPermaLink="false">cdbcc3d4-3a89-4768-ac1d-0221f8c99227 GMT</guid>
<ta:warnings>
<dc:coverage>Czech Republic</dc:coverage>
<ta:level>2/5</ta:level>
<dc:description>Exercise normal safety precautions</dc:description>
</ta:warnings>
</item>
我想為我擁有的每個項目在
<warning>
下提取<ta:level>
的值。 我曾經嘗試過現有的在線解決方案,但對我而言沒有任何用處。 基本上,我的xml包含多個名稱空間。
req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()
tree = etree.fromstring(a)
ns = {'TravelAd': 'https://smartraveller.gov.au/countries/documents/index.rss',
'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}
e = tree.findall('{0}channel/{0}item/{0}warnings/{0}level'.format(ns))
for i in e:
print(i.text)
XML具有多個名稱空間,但是您唯一需要擔心的名稱空間是http://www.smartraveller.gov.au/schema/rss/travel_advisories/
。
這是因為名稱空間中指向目標的路徑中僅有的元素是ta:level
和ta:warning
。
例...
from lxml import etree
import requests
req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()
tree = etree.fromstring(a)
ns = {'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}
e = tree.findall('channel/item/ta:warnings/ta:level', ns)
for i in e:
print(i.text)
打印...
2/5
2/5
4/5
2/5
...and so on
如果需要列表,請考慮從findall()
切換到xpath()
...
e = tree.xpath('channel/item/ta:warnings/ta:level/text()', namespaces=ns)
print(e)
打印...
['2/5', '2/5', '4/5', '2/5', and so on...]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.