繁体   English   中英

使用LXML和Python解析XML

[英]Parsing XML with LXML and Python

我有以下XML:

<nfl>
  <season season="2012"/>
    <conference label="AFC">
      <division label="Eastern Division">
        <team city="Buffalo" name="Bills"  alias="Buf" />
        <team city="Miami" name="Dolphins"  alias="Mia" />
        <team city="New England" name="Patriots"  alias="NE" />
        <team city="New York" name="Jets"  alias="NYJ" />
      </division>
      <division label="Western Division">
        <team city="Denver" name="Broncos"  alias="Den" />
        <team city="Kansas City" name="Chiefs"  alias="KC" />
        <team city="Oakland" name="Raiders"  alias="Oak" />
        <team city="San Diego" name="Chargers"  alias="SD" />
      </division>
      <division label="Northern Division">
        <team city="Cincinnati" name="Bengals"  alias="Cin" />
        <team city="Cleveland" name="Browns"  alias="Cle" />
        <team city="Pittsburgh" name="Steelers"  alias="Pit" />
        <team city="Baltimore" name="Ravens"  alias="Bal" />
      </division>
      <division label="Southern Division">
        <team city="Houston" name="Texans"  alias="Hou" />
        <team city="Tennessee" name="Titans"  alias="Ten" />
        <team city="Indianapolis" name="Colts"  alias="Ind" />
        <team city="Jacksonville" name="Jaguars"  alias="Jac" />
    </division>
  </conference>
  <conference label="NFC">
    <division label="Eastern Division">
      <team city="Dallas" name="Cowboys"  alias="Dal" />
      <team city="New York" name="Giants"  alias="NYG" />
      <team city="Philadelphia" name="Eagles"  alias="Phi" />
      <team city="Washington" name="Redskins"  alias="Was" />
    </division>
    <division label="Western Division">
      <team city="St. Louis" name="Rams"  alias="StL" />
      <team city="Arizona" name="Cardinals"  alias="Ari" />
      <team city="San Francisco" name="49ers"  alias="SF" />
      <team city="Seattle" name="Seahawks"  alias="Sea" />
    </division>
    <division label="Northern Division">
      <team city="Chicago" name="Bears"  alias="Chi" />
      <team city="Detroit" name="Lions"  alias="Det" />
      <team city="Green Bay" name="Packers"  alias="GB" />
      <team city="Minnesota" name="Vikings"  alias="Min" />
    </division>
    <division label="Southern Division">
      <team city="Atlanta" name="Falcons"  alias="Atl" />
      <team city="New Orleans" name="Saints"  alias="NO" />
      <team city="Tampa Bay" name="Buccaneers"  alias="TB" />
      <team city="Carolina" name="Panthers"  alias="Car" />
  </division>
</conference>

</nfl>

我想将团队“城市”,“名称”和“别名”以及父级“分区标签”,“会议标签”和“季节”加载到我的模型中。

在Python中,我如下遍历数据:

from lxml import etree
doc = etree.parse('thisxmlfile.xml')
for s in doc.xpath('//season'):
    for c in doc.xpath('//conference'):
        for t in doc.xpath('//conference/division/team'):
            print s.get('season'), c.get('label'), t.get('city'), t.get('name'), t.get('alias')

但是,当然,它会迭代所有“团队”标签两次-每个“会议”标签一次。 我想做的是遍历所有“团队”标签一次,并获得父级“分区标签”,父级“会议标签”和父级“季节”。

可以肯定,我需要参考XPATH轴并且正在寻求帮助吗?

我正在寻找的输出是:

2012 AFC Buffalo Bills Buf
2012 AFC Miami Dolphins Mia
2012 AFC New England Patriots NE
.
.
.
2012 NFC New Orleans Saints NO
2012 NFC Tampa Bay Buccaneers TB
2012 NFC Carolina Panthers Car

注意:上面的输出不包含“分区标签”,但是一旦我弄清楚如何获取“会议标签”,它应该很容易。

在此先感谢您的帮助。

这是获取所需输出的方法:

from lxml import etree

doc = etree.parse('thisxmlfile.xml')

# There is only one "season" element
season = doc.find('season').get('season')     

# XPath query relative to root node
for conference in doc.xpath('conference'):
    # XPath query relative to "conference" node     
    for team in conference.xpath('division/team'):     
        print season, conference.get('label'),
        print team.get('city'), team.get('name'), team.get('alias')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM