parse .xml with prefix's on tags? xml.etree.ElementTree

Question

I can read tags, except when there is a prefix. I'm not having luck searching SO for a previous question.

I need to read media:content . I tried image = node.find("media:content") . Rss input:

<channel>
  <title>Popular  Photography in the last 1 week</title>
  <item>
    <title>foo</title>
    <media:category label="Miscellaneous">photography/misc</media:category>
    <media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
  </item>
  <item> ... </item>
</channel>

I can read a sibling tag title .

from xml.etree import ElementTree
with open('cache1.rss', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.findall('.//channel/item'):
    title =  node.find("title").text

I've been using the docs, yet stuck on the 'prefix' part.

Answer 1

Here's an example of using XML namespaces with ElementTree :

>>> x = '''\
<channel xmlns:media="http://www.w3.org/TR/html4/">
  <title>Popular  Photography in the last 1 week</title>
  <item>
    <title>foo</title>
    <media:category label="Miscellaneous">photography/misc</media:category>
    <media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
  </item>
  <item> ... </item>
</channel>
'''
>>> node = ElementTree.fromstring(x)
>>> for elem in node.findall('item/{http://www.w3.org/TR/html4/}category'):
        print elem.text


photography/misc

Answer 2

media is an XML namespace, it has to be defined somewhere earlier with xmlns:media="..." . See http://lxml.de/xpathxslt.html#namespaces-and-prefixes for how to define xml namespaces for use in XPath expressions in lxml.

parse .xml with prefix's on tags? xml.etree.ElementTree

Question

2 answers

solution1
5 ACCPTED 2011-10-31 01:24:15

solution2
0 2011-10-31 01:05:22

parse .xml with prefix's on tags? xml.etree.ElementTree

Question

2 answers

solution1 5 ACCPTED 2011-10-31 01:24:15

solution2 0 2011-10-31 01:05:22

solution1
5 ACCPTED 2011-10-31 01:24:15

solution2
0 2011-10-31 01:05:22