简体   繁体   中英

Python: specifying the namespace in an lxml.etree path

I'm trying to figure out how to access a specific element by id in an SVG file. I was using the python library of lxml to parse through the file, but it always comes up empty. Here is the python script I used to access the element:

#!/usr/bin/env python

from lxml import etree
XHTML_NAMESPACE = "http://www.w3.org/2000/svg"
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {None : XHTML_NAMESPACE}

root = etree.parse("temp.svg")
textid = "text1274"
path = ".//text[@id='" + textid + "']/title"
name = root.findtext(path=path, namespaces=NSMAP)
print name

The result is always an empty string ('None'), but no error. It believes it found what I was looking for, but what I wanted was the element text (which should have been "Wei, 771 - 661BCE."). Here is the incriminating SVG file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<svg
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:cc="http://creativecommons.org/ns#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:svg="http://www.w3.org/2000/svg"
   xmlns="http://www.w3.org/2000/svg"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   version="1.1"
   xml:space="preserve"
   viewBox="0 0 54001 32400"
   id="svg2"
   inkscape:version="0.91 r"
   sodipodi:docname="china700BC.svg"><sodipodi:namedview
   pagecolor="#ffffff"
   bordercolor="#666666"
   borderopacity="1"
   objecttolerance="10"
   gridtolerance="10"
   guidetolerance="10"
   inkscape:pageopacity="0"
   inkscape:pageshadow="2"
   inkscape:window-width="1366"
   inkscape:window-height="692"
   id="namedview2468"
   showgrid="false"
   inkscape:zoom="0.016419753"
   inkscape:cx="17689.896"
   inkscape:cy="17739.986"
   inkscape:window-x="0"
   inkscape:window-y="24"
   inkscape:window-maximized="1"
   inkscape:current-layer="svg2" />
<defs
   id="defs4">
<filter
   id="blur2">
<feGaussianBlur
   id="feGaussianBlur7"
   result="blur"
   stdDeviation="2"
   in="SourceGraphic" /> 
</filter>
<filter
   id="blur4">
<feGaussianBlur
   id="feGaussianBlur10"
   result="blur"
   stdDeviation="4"
   in="SourceGraphic" /> 
</filter>
<filter
   id="blur8">
<feGaussianBlur
   id="feGaussianBlur13"
   result="blur"
   stdDeviation="8"
   in="SourceGraphic" /> 
</filter>
<filter
   id="blur16">
<feGaussianBlur
   id="feGaussianBlur16"
   result="blur"
   stdDeviation="16"
   in="SourceGraphic" /> 
</filter>
<filter
   id="blur32">
<feGaussianBlur
   id="feGaussianBlur19"
   result="blur"
   stdDeviation="32"
   in="SourceGraphic" /> 
</filter>
<filter
   id="blur64">
<feGaussianBlur
   id="feGaussianBlur22"
   result="blur"
   stdDeviation="64"
   in="SourceGraphic" /> 
</filter>

</defs> 
&gt;

<g
   stroke-linecap="round"
   stroke-linejoin="round"
   stroke-miterlimit="7"
   stroke-width="14"
   fill="none"
   filter="url(#blur2)"
   id="fntr">
<ellipse
   id="ellipse381"
   fill="white"
   stroke="white"
   ry="1"
   rx="1"
   cy="0"
   cx="0" />
<ellipse
   id="ellipse383"
   fill="white"
   stroke="white"
   ry="1"
   rx="1"
   cy="32400"
   cx="54001" />
<ellipse
   fill="#FEBADE"
   ry="1"
   rx="1"
   cy="24759"
   cx="48948"
   id="295286-dummy" />
</g>

<g
   text-anchor="middle"
   id="regn">
</g>
<g
   text-anchor="middle"
   id="cultr">
</g>
<g
   text-anchor="middle"
   id="peopl">
</g>
<g
   font-style="italic"
   text-anchor="middle"
   id="tribe">
</g>

<text
   id="text455"
   x="30542.088"
   y="16248.173"
   font-size="20"
   style="font-weight:normal;font-size:233.01080322px;text-anchor:middle"><title
   id="title457">Chen.</title>Chen</text>

<text
   id="text1274"
   x="28689.652"
   y="12753.011"
   font-size="28"
   style="font-weight:normal;font-size:326.21511841px;text-anchor:middle"><title
   id="title1276">Wei, 771 - 661BCE.</title>Wei</text>

   <script
   id="script2466">
      function LoadHandler(event) 
      {
         new Title(event.getTarget().getOwnerDocument(), 810);
      }
   </script>
</svg>

I discovered that I can eliminate the error by deleting the eighth line, beginning with "xmlns=..." (which is the namespace declaration). However, due to the nature of where I obtained this file I cannot permanently remove this line (and probably shouldn't). Is there some way (such as properly specifying the namespace) I can get the expected output without having to edit the XML at all?

Thanks a ton

Mapping default namespace to None prefix didn't work for me either. You can, however, map it to a normal string prefix and use that prefix in the xpath, the rest of your codes are working without any change :

from lxml import etree
XHTML_NAMESPACE = "http://www.w3.org/2000/svg"
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {'d' : XHTML_NAMESPACE}  # map default namespace to prefix 'd:'

root = etree.parse("temp.svg")
textid = "text1274"
path = ".//d:text[@id='" + textid + "']/d:title" # use registered prefix in xpath
name = root.findtext(path=path, namespaces=NSMAP)
print name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM