简体   繁体   中英

python 3.x xml parsing similar to plistlib?

I have GPS data stored as as .tcx file. This is a xml file (begging of file below)

<?xml version="1.0" encoding="utf-8"?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp1="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpx="http://www.topografix.com/GPX/1/1" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">
    <Activities>
        <Activity Sport="Other">
            <Id>2012-01-17T11:44:35Z</Id>
            <Lap StartTime="2012-01-17T11:44:35Z">
                <TotalTimeSeconds>0</TotalTimeSeconds>
                <DistanceMeters>0</DistanceMeters>
                <Calories>0</Calories>
                <Intensity>Active</Intensity>
                <TriggerMethod>Manual</TriggerMethod>
                <Track>
                    <Trackpoint>
                        <Time>2012-01-17T11:44:35Z</Time>
                        <Position>
                            <LatitudeDegrees>59.720211518183351</LatitudeDegrees>

The only similar thing I have worked with have been apple .plists which use a similar format, although the info is nested within a <dictionary> tag I believe.

Where the following would give me nested dictionaries...

import plistlib
pl = plistlib.readPlist('/Users/name/Documents/file.plist')

for sub_dict in pl:
    print(sub_dict['keyA'])
    print(sub_dict['keyD'])
    print(sub_dict['keyE'])
    print(sub_dict['keyG'])

I am aware of xml.dom.minidom, etree and lxml, but I am having trouble working out how to get the same output as the above plistlib module gives me.

My final aim is to be able to merge selected keys from the two data sets together. One step at a time...

EDIT -----------------

I have got something working:

from xml.dom.minidom import parse
doc = parse('/Users/name/Documents/GPS/gps.tcx')
lat = doc.getElementsByTagName("LatitudeDegrees")
time = doc.getElementsByTagName("Time")

for x in lat:
    print(x.firstChild.data)

I had to add closing tags to your posted XML so the lxml parser could parse it. Once that is done, the Time and LatitudeDegrees data can be pulled out using calls to doc.xpath .

import lxml.etree as ET
import io

content='''<?xml version="1.0" encoding="utf-8"?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp1="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpx="http://www.topografix.com/GPX/1/1" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">
    <Activities>
        <Activity Sport="Other">
            <Id>2012-01-17T11:44:35Z</Id>
            <Lap StartTime="2012-01-17T11:44:35Z">
                <TotalTimeSeconds>0</TotalTimeSeconds>
                <DistanceMeters>0</DistanceMeters>
                <Calories>0</Calories>
                <Intensity>Active</Intensity>
                <TriggerMethod>Manual</TriggerMethod>
                <Track>
                    <Trackpoint>
                        <Time>2012-01-17T11:44:35Z</Time>
                        <Position>
                            <LatitudeDegrees>59.920211518183351</LatitudeDegrees>
</Position>
</Trackpoint>
</Track>
</Lap>
</Activity>
</Activities>
</TrainingCenterDatabase>
'''

doc = ET.fromstring(content)

ns = {'ns':'http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2'}
for trackpoint in doc.xpath('//ns:Trackpoint', namespaces = ns):
    print(trackpoint.xpath('(ns:Time|ns:Position/ns:LatitudeDegrees)/text()', namespaces = ns))

yields

['2012-01-17T11:44:35Z', '59.920211518183351']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM