简体   繁体   中英

Python: How do you use lxml to parse xml tags with periods?

I am attempting to parse Jenkin's job XML files using the lxml module for Python. It looks like this:

<triggers>
    <hudson.triggers.TimerTrigger>
       <spec>H H(6-21)/3 * * *</spec>
</hudson.triggers.TimerTrigger>

I like using lxml's handy objectify module, but it gets confused when I try to do this:

root.triggers.hudson.triggers.TimerTrigger.spec = 'something'

I get an AttributeError: no such child: hudson . Of course there's no attribute named hudson! How does one work with a goofy piece of XML like this?

For additional context, here is my code:

from lxml import objectify
import jenkins

j = jenkins.Jenkins('http://local.jenkins.instance')
xml = j.get_job_config('job_name')
root = objectify.fromstring(xml)
root.triggers.hudson.triggers.TimerTrigger.spec = 'something'

The following code using lxml 's etree module worked for me to get the text from <spec> :

from lxml import etree

root = etree.parse("37757193.xml").getroot()
spec = root.xpath("//triggers/hudson.triggers.TimerTrigger/spec")[0]
print(spec.text)

returns 'HH(6-21)/3 * * *' .

It does make sense that triggers.hudson.triggers.TimerTrigger interpreted as trying to access <TimerTrigger> element in the following structure, hence it complained about hudson child element not found when given OP's actual XML :

<triggers> 
  <hudson> 
    <triggers> 
      <TimerTrigger> 
        <spec>H H(6-21)/3 * * *</spec> 
      </TimerTrigger> 
    </triggers> 
  </hudson> 
</triggers>

One possible way to acess child element where name contains dots without having to switch to etree would be using __getattr__() method :

>>> root.triggers.__getattr__('hudson.triggers.TimerTrigger').spec
'H H(6-21)/3 * * *'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM