简体   繁体   中英

how do I filter values from XML file in python

I have a basic grasp of XML and python and have been using minidom with some success. I have run into a situation where I am unable to get the values I want from an XML file. Here is the basic structure of the pre-existing file.

<localization>
    <b n="Stats">
        <l k="SomeStat1">
            <v>10</v>
        </l>
        <l k="SomeStat2">
            <v>6</v>
        </l>
    </b>
    <b n="Levels">
        <l k="Level1">
            <v>Beginner Level</v>
        </l>
        <l k="Level2">
            <v>Intermediate Level</v>
        </l>
    </b>
</localization>

There are about 15 different <b> tags with dozens of children. What I'd like to do is, if given a level number(1), is find the <v> node for the corresponding level. I just have no idea how to go about this.

You might consider using XPATH, a language for addressing parts of an xml document.

Here's the answer using lxml.etree and it's support for xpath .

>>> data = """
... <localization>
...     <b n="Stats">
...         <l k="SomeStat1">
...             <v>10</v>
...         </l>
...         <l k="SomeStat2">
...             <v>6</v>
...         </l>
...     </b>
...     <b n="Levels">
...         <l k="Level1">
...             <v>Beginner Level</v>
...         </l>
...         <l k="Level2">
...             <v>Intermediate Level</v>
...         </l>
...     </b>
... </localization>
... """
>>>
>>> from lxml import etree
>>>
>>> xmldata = etree.XML(data)
>>> xmldata.xpath('/localization/b[@n="Levels"]/l[@k=$level]/v/text()',level='Level1')
['Beginner Level']
#!/usr/bin/python

from xml.dom.minidom import parseString

xml = parseString("""<localization>
    <b n="Stats">
        <l k="SomeStat1">
            <v>10</v>
        </l>
        <l k="SomeStat2">
            <v>6</v>
        </l>
    </b>
    <b n="Levels">
        <l k="Level1">
            <v>Beginner Level</v>
        </l>
        <l k="Level2">
            <v>Intermediate Level</v>
        </l>
    </b>
</localization>""")

level = 1
blist = xml.getElementsByTagName('b')
for b in blist:
    if b.getAttribute('n') == 'Levels':
        llist = b.getElementsByTagName('l')
        l = llist.item(level)
        v = l.getElementsByTagName('v')
        print v.item(0).firstChild.nodeValue;
        #prints Intermediate Level

If you really only care about searching for an <l> tag with a specific "k" attribute and then getting its <v> tag (that's how I understood your question), you could do it with DOM:

from xml.dom.minidom import parseString

xmlDoc = parseString("""<document goes here>""")
lNodesWithLevel2 = [lNode for lNode in xmlDoc.getElementsByTagName("l")
                    if lNode.getAttribute("k") == "Level2"]

matchingVNodes = map(lambda lNode: lNode.getElementsByTagName("v"), lNodesWithLevel2)

print map(lambda vNode: vNode.firstChild.nodeValue, matchingVNodes)
# Prints [u'Intermediate Level']

How that is what you meant.

If you could use BeautifulSoup library (couldn't you?) you could end up with this dead-simple code:

from BeautifulSoup import BeautifulStoneSoup

def get_it(xml, level_n):
    soup = BeautifulStoneSoup(xml)
    l = soup.find('l', k="Level%d" % level_n)
    return l.v.string

if __name__ == '__main__':
    print get_it(1)

It prints Beginner Level for the example XML you provided.

level = "Level"+raw_input("Enter level number: ")
content= open("xmlfile").read()
data= content.split("</localization>")
for item in data:
    if "localization" in item:
        s = item.split("</b>")
        for i in s:
           if """<b n="Levels">""" in i:
                for c in i.split("</l>"):
                    if "<l" in c and level in c:
                         for v in c.split("</v>"):
                            if "<v>" in v:
                                print v[v.index("<v>")+3:]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM