简体   繁体   中英

Reading element values from XML using Python LXML

<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
 <S I="50" N="Football">
  <C I="65" N="Russia">
    <L I="167" N="Premier League">
      <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
        <M K="1x2">
          <B I="81" BTDT="2015-03-23T23:04:00,825">
            <O N="1" V="3"/>
            <O N="X" V="3.1"/>
            <O N="2" V="2.25"/>
        </B>
      </M>
     </E>
    </L>
   </C>
 </S>
</markets>

I am trying to parse this XML using etree in Python. I have done XML parsing before but the documents have always been in the format.

  <tag> value </tag>

I am unsure how to isolate the "D" from "Markets" as well as all the other values.

This is how I open and parse the XML Doc:

z = gzip.open("code2.zip", "r")

tree = etree.parse(z)
print(etree.tostring(tree, pretty_print=True))

I tried:

for markets in tree.findall('markets'):
    print "found"

However this doesn't work. I would appreciate some tips/advice. Hopefully once I get the first "D" extracted I'll be able to get the rest.

This is a common error when dealing with XML having default namespace . Your XML has default namespace, a namespace declared without prefix, here :

xmlns="http://www.eoddsmaker.net/schemas/markets/1.0"

Threrefore, in your case, all elements are implicitly considered in that namespace. One possible way to query elements in namespace using xpath() :

.......
#creating prefix-to-namespace_uri mapping
ns = {'d' : 'http://www.eoddsmaker.net/schemas/markets/1.0'}

#use registered prefix along with the element name to query, and pass the mapping as 2nd argument
markets = tree.xpath('//d:markets', namespaces=ns)[0]

#get and print value of D attribute from <markets> :
print markets.get('D')

I am answering this question with no knowledge of etree. I simply opened the following page: https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

What you are looking for is attributes, and it is shown how to derive them quite clearly:

tree = etree.parse(z)
root = tree.getroot()
print root.attrib

there are all of your attributes for the <markets> element, like D and CNT.

You should be able to figure out the rest on your own. You simply must loop through the children of each element and grab .attrib from each.

Considering I found this answer so easily, please do a bit more research before posting a question :)

PS this answer was written for Python 2.7. For Python 3, it would be print(tree.attrib)

Try this with xml.etree

import xml.etree.ElementTree as ET
root = ET.fromstring("""<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
     <S I="50" N="Football">
      <C I="65" N="Russia">
        <L I="167" N="Premier League">
          <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
            <M K="1x2">
              <B I="81" BTDT="2015-03-23T23:04:00,825">
                <O N="1" V="3"/>
                <O N="X" V="3.1"/>
                <O N="2" V="2.25"/>
            </B>
          </M>
         </E>
        </L>
       </C>
     </S>
    </markets>""")

>>>print root.attrib
{'CNT': '1521', 'D': '2015-03-23T23:12:34'}
>>>print root[0].attrib
{'I': '50', 'N': 'Football'}
#and so on to next parse next line

If need of parse from xml file .

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

For more refer https://docs.python.org/2/library/xml.etree.elementtree.html

print markets.get('D');

To print the 'D' in markets (the root)

for element in tree.iterfind(".//{*}<Tag to search for>"):
   print element.get("<Attribute to look for>");

Will iterate through the elements in the XML file encapsulated by the current node and print the specified attribute of each element in iterfind().

For example:

for element in tree.iterfind(".//{*}O"):
   print element.get("N");

Will print

1
X
2

Also note, if there are multiple namespaces in the XML document you'll have to specify in the curly braces in the string passed to iterfind() to match the namespace you want to search under.

for element in tree.iterfind(".//{http://www.eoddsmaker.net/schemas/markets/1.0}<Tag to search for>"):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM