简体   繁体   中英

Parser XML in python

I have some database like the next one in XML and im trying to parser it with Python 2.7:

<team>
    <generator>
        <team_name>TeamMaster</team_name>
        <team_year>2000</team_year>
        <team_city>NewYork</team_city>
    </generator>
    <players>
        <definition name="John V." number="4" age="25">
          <criteria position="fow" side="right">
            <criterion website="www.johnV.com" version="1" result="true"/>
          </criteria>
          <object debut="2003" version="3" flag="complete">
            <history item_ref="team34"/>
            <history item_ref="mainteam"/>
        </definition>
        <definition name="Emma" number="2" age="19">
          <criteria position="mid" side="left">
            <criterion website="www.emma.net" version="7" result="true"/>
          </criteria>
          <object debut="2008" version="1" flag="complete">
            <history item_ref="newteam"/>
            <history item_ref="youngteam"/>
            <history item_ref="oldteam"/>
        </definition>

    </players>
</team>

With this small scrip I can parse easily the first part "generator" from my xml, where I know all elements that contains:

from xml.dom.minidom import parseString

mydb = {
"team_name": ,
"team_year": ,
"team_data": 
}

file = open('mydb.xml','r')
data = file.read()
file.close()
dom = parseString(data)
#retrieve the first xml tag (<tag>data</tag>) that the parser finds with name tagName:
xmlTag = dom.getElementsByTagName('team_name')[0].toxml()
#strip off the tag (<tag>data</tag>  --->   data):
xmlData=xmlTag.replace('<team_name>','').replace('</team_name>','')

mydb["team_name"] = xmlData # TeamMaster

But my real problem came when I tried to parse the "players" elements, where attributes appears in "definition" and an unknown numbers of elements in "history". Maybe there is another module that would help me for this better than minidon?

Better use xml.etree.ElementTree, it has a more pythonic syntax. Get the text of team_name by root.findtext('team_name') or iterate over all definitions with root.finditer('definitions') .

You can use either Element Tree - XML Parser or use BeautifulSoup XML Parser. I have created repo for usage of XML parser here XML Parsers Collection

Snippet code below:

    #Get the data from XML parser.
    users = xml_parser(users_file,'user') 

    #Iterate through root element.
    for user in users:
        print(user.find('country').text)
        print(user.find('city').text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM