如何遍历XML树而不必担心Python中的名称空间前缀？

Question

例如，要读取RSS提要，此操作将不起作用，因为在'item'之前插入了愚蠢的{ http://purl.org ...}命名空间：

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import urllib, urllib.request

url = "http://some/rss/feed"
response = urllib.request.urlopen(url)
xml_text = response.read().decode('utf-8')
xml_root = ET.fromstring(xml_text)
for e in xml_root.findall('item'):
  print("I found an item!")

现在由于{}前缀而使findall（）变得无用，这是另一种解决方案，但这很丑陋：

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import urllib, urllib.request

url = "http://some/rss/feed"
response = urllib.request.urlopen(url)
xml_text = response.read().decode('utf-8')
xml_root = ET.fromstring(xml_text)
for e in xml_root:
  if e.tag.endswith('}item'):
    print("I found an item!")

我可以让ElementTree丢弃所有前缀吗？

Answer 1

您需要按照以下说明清楚地处理名称空间：

通过'ElementTree'在Python中使用名称空间解析XML

但是，如果相反，您将使用一个专门的库来阅读RSS feed，例如feedparser ：

>>> import feedparser
>>> url = "http://some/rss/feed"
>>> feed = feedparser.parse(url)

尽管我个人会使用XMLFeedSpider Scrapy蜘蛛。 作为奖励，您将获得所有其他Scrapy Web抓取框架功能。

如何遍历XML树而不必担心Python中的名称空间前缀？

问题描述

1 个解决方案

解决方案1
1 2015-03-04 23:45:47

如何遍历XML树而不必担心Python中的名称空间前缀？

问题描述

1 个解决方案

解决方案1 1 2015-03-04 23:45:47

解决方案1
1 2015-03-04 23:45:47