简体   繁体   English

使用Python LXML从XML读取元素值

[英]Reading element values from XML using Python LXML

<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
 <S I="50" N="Football">
  <C I="65" N="Russia">
    <L I="167" N="Premier League">
      <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
        <M K="1x2">
          <B I="81" BTDT="2015-03-23T23:04:00,825">
            <O N="1" V="3"/>
            <O N="X" V="3.1"/>
            <O N="2" V="2.25"/>
        </B>
      </M>
     </E>
    </L>
   </C>
 </S>
</markets>

I am trying to parse this XML using etree in Python. 我正在尝试使用Python中的etree解析此XML。 I have done XML parsing before but the documents have always been in the format. 我之前已经做过XML解析,但是文档始终采用这种格式。

  <tag> value </tag>

I am unsure how to isolate the "D" from "Markets" as well as all the other values. 我不确定如何将“ D”与“市场”以及所有其他值区分开。

This is how I open and parse the XML Doc: 这就是我打开和解析XML文档的方式:

z = gzip.open("code2.zip", "r")

tree = etree.parse(z)
print(etree.tostring(tree, pretty_print=True))

I tried: 我试过了:

for markets in tree.findall('markets'):
    print "found"

However this doesn't work. 但是,这不起作用。 I would appreciate some tips/advice. 我将不胜感激一些提示/建议。 Hopefully once I get the first "D" extracted I'll be able to get the rest. 希望一旦我提取了第一个“ D”,我就能得到其余的。

This is a common error when dealing with XML having default namespace . 在处理具有默认名称空间的 XML时,这是一个常见错误。 Your XML has default namespace, a namespace declared without prefix, here : 您的XML具有默认的命名空间,在此声明为不带前缀的命名空间:

xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" 的xmlns = “http://www.eoddsmaker.net/schemas/markets/1.0”

Threrefore, in your case, all elements are implicitly considered in that namespace. 因此,在您的情况下,所有元素都会隐式考虑在该命名空间中。 One possible way to query elements in namespace using xpath() : 一种使用xpath()查询名称空间中元素的可能方法:

.......
#creating prefix-to-namespace_uri mapping
ns = {'d' : 'http://www.eoddsmaker.net/schemas/markets/1.0'}

#use registered prefix along with the element name to query, and pass the mapping as 2nd argument
markets = tree.xpath('//d:markets', namespaces=ns)[0]

#get and print value of D attribute from <markets> :
print markets.get('D')

I am answering this question with no knowledge of etree. 我在不了解etree的情况下回答了这个问题。 I simply opened the following page: https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml 我只是打开了以下页面: https : //docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

What you are looking for is attributes, and it is shown how to derive them quite clearly: 您正在寻找的是属性,并且向您展示了如何非常清楚地推导出它们:

tree = etree.parse(z)
root = tree.getroot()
print root.attrib

there are all of your attributes for the <markets> element, like D and CNT. 您具有<markets>元素的所有属性,例如D和CNT。

You should be able to figure out the rest on your own. 您应该能够自己找出其余的部分。 You simply must loop through the children of each element and grab .attrib from each. 您只需要遍历每个元素的子元素并从每个元素中获取.attrib

Considering I found this answer so easily, please do a bit more research before posting a question :) 考虑到我很容易找到这个答案,请在发布问题之前做一些研究:)

PS this answer was written for Python 2.7. PS此答案是为python 2.7编写的。 For Python 3, it would be print(tree.attrib) 对于Python 3,它将为print(tree.attrib)

Try this with xml.etree 试试xml.etree

import xml.etree.ElementTree as ET
root = ET.fromstring("""<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
     <S I="50" N="Football">
      <C I="65" N="Russia">
        <L I="167" N="Premier League">
          <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
            <M K="1x2">
              <B I="81" BTDT="2015-03-23T23:04:00,825">
                <O N="1" V="3"/>
                <O N="X" V="3.1"/>
                <O N="2" V="2.25"/>
            </B>
          </M>
         </E>
        </L>
       </C>
     </S>
    </markets>""")

>>>print root.attrib
{'CNT': '1521', 'D': '2015-03-23T23:12:34'}
>>>print root[0].attrib
{'I': '50', 'N': 'Football'}
#and so on to next parse next line

If need of parse from xml file . 如果需要从xml文件解析

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

For more refer https://docs.python.org/2/library/xml.etree.elementtree.html 有关更多信息,请参见https://docs.python.org/2/library/xml.etree.elementtree.html

print markets.get('D');

To print the 'D' in markets (the root) 在市场(根)上打印“ D”

for element in tree.iterfind(".//{*}<Tag to search for>"):
   print element.get("<Attribute to look for>");

Will iterate through the elements in the XML file encapsulated by the current node and print the specified attribute of each element in iterfind(). 将遍历当前节点封装的XML文件中的元素,并在iterfind()中打印每个元素的指定属性。

For example: 例如:

for element in tree.iterfind(".//{*}O"):
   print element.get("N");

Will print 将打印

1
X
2

Also note, if there are multiple namespaces in the XML document you'll have to specify in the curly braces in the string passed to iterfind() to match the namespace you want to search under. 还要注意,如果XML文档中有多个名称空间,则必须在大括号中指定传递给iterfind()的字符串中的字符串,以匹配要在其下搜索的名称空间。

for element in tree.iterfind(".//{http://www.eoddsmaker.net/schemas/markets/1.0}<Tag to search for>"):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM