繁体   English   中英

使用Python LXML从XML读取元素值

[英]Reading element values from XML using Python LXML

<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
 <S I="50" N="Football">
  <C I="65" N="Russia">
    <L I="167" N="Premier League">
      <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
        <M K="1x2">
          <B I="81" BTDT="2015-03-23T23:04:00,825">
            <O N="1" V="3"/>
            <O N="X" V="3.1"/>
            <O N="2" V="2.25"/>
        </B>
      </M>
     </E>
    </L>
   </C>
 </S>
</markets>

我正在尝试使用Python中的etree解析此XML。 我之前已经做过XML解析,但是文档始终采用这种格式。

  <tag> value </tag>

我不确定如何将“ D”与“市场”以及所有其他值区分开。

这就是我打开和解析XML文档的方式:

z = gzip.open("code2.zip", "r")

tree = etree.parse(z)
print(etree.tostring(tree, pretty_print=True))

我试过了:

for markets in tree.findall('markets'):
    print "found"

但是,这不起作用。 我将不胜感激一些提示/建议。 希望一旦我提取了第一个“ D”,我就能得到其余的。

在处理具有默认名称空间的 XML时,这是一个常见错误。 您的XML具有默认的命名空间,在此声明为不带前缀的命名空间:

的xmlns = “http://www.eoddsmaker.net/schemas/markets/1.0”

因此,在您的情况下,所有元素都会隐式考虑在该命名空间中。 一种使用xpath()查询名称空间中元素的可能方法:

.......
#creating prefix-to-namespace_uri mapping
ns = {'d' : 'http://www.eoddsmaker.net/schemas/markets/1.0'}

#use registered prefix along with the element name to query, and pass the mapping as 2nd argument
markets = tree.xpath('//d:markets', namespaces=ns)[0]

#get and print value of D attribute from <markets> :
print markets.get('D')

我在不了解etree的情况下回答了这个问题。 我只是打开了以下页面: https : //docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

您正在寻找的是属性,并且向您展示了如何非常清楚地推导出它们:

tree = etree.parse(z)
root = tree.getroot()
print root.attrib

您具有<markets>元素的所有属性,例如D和CNT。

您应该能够自己找出其余的部分。 您只需要遍历每个元素的子元素并从每个元素中获取.attrib

考虑到我很容易找到这个答案,请在发布问题之前做一些研究:)

PS此答案是为python 2.7编写的。 对于Python 3,它将为print(tree.attrib)

试试xml.etree

import xml.etree.ElementTree as ET
root = ET.fromstring("""<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
     <S I="50" N="Football">
      <C I="65" N="Russia">
        <L I="167" N="Premier League">
          <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
            <M K="1x2">
              <B I="81" BTDT="2015-03-23T23:04:00,825">
                <O N="1" V="3"/>
                <O N="X" V="3.1"/>
                <O N="2" V="2.25"/>
            </B>
          </M>
         </E>
        </L>
       </C>
     </S>
    </markets>""")

>>>print root.attrib
{'CNT': '1521', 'D': '2015-03-23T23:12:34'}
>>>print root[0].attrib
{'I': '50', 'N': 'Football'}
#and so on to next parse next line

如果需要从xml文件解析

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

有关更多信息,请参见https://docs.python.org/2/library/xml.etree.elementtree.html

print markets.get('D');

在市场(根)上打印“ D”

for element in tree.iterfind(".//{*}<Tag to search for>"):
   print element.get("<Attribute to look for>");

将遍历当前节点封装的XML文件中的元素,并在iterfind()中打印每个元素的指定属性。

例如:

for element in tree.iterfind(".//{*}O"):
   print element.get("N");

将打印

1
X
2

还要注意,如果XML文档中有多个名称空间,则必须在大括号中指定传递给iterfind()的字符串中的字符串,以匹配要在其下搜索的名称空间。

for element in tree.iterfind(".//{http://www.eoddsmaker.net/schemas/markets/1.0}<Tag to search for>"):

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM