简体   繁体   中英

extraction child text python with lxml

i'm triying to extract from xml file (GPX) all informations related to the waypoints of my gpx file with lxml library. there is a subset of my gpx file.

<?xml version="1.0"?>
<gpx
 version="1.0"
creator="GPSBabel - http://www.gpsbabel.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/0"
xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd">
<time>2006-01-23T02:00:28Z</time>
<trk>
  <name>08-JAN-06 02</name>
<trkseg>
<trkpt lat="-33.903422356" lon="151.175565720">
  <ele>19.844360</ele>
<time>2006-01-08T06:45:07Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>

i can get point latitude and longitude by:

node.get("lon") and node.get("lat")

but when i try to get time with :

for element in root:
    if element.tag=="{http://www.topografix.com/GPX/1/0}time":
       time=str(element.text)

i get finally for example this kind of results

(1.45,32.12,'')

a blank value for time how can i solve this

I'm assuming there is a </trk> and a </trkseg> tag that's supposed to be at the end of what you posted, or else this would be kind of malformed.

I'm going to write this out in a very verbose way. First, let's assume you've got an lxml object containing your xml-- we'll call it tree .

First define your namespace, if necessary:

ns = {'gpx': 'http://www.topografix.com/GPX/1/0'}

I like using XPath queries. If you try a query like tree.xpath('//trk') and get an undefined namespace error, try again by specifying a namespace argument-- you have to prefix your xpath expressions with the key, like tree.xpath('//gpx:trk', namespaces=ns)

Now you want to get a list of all your trk objects:

trk_objects = tree.xpath('//gpx:trk', namespaces=ns)

This will return a list of them or an empty list if there are no trk tags.

Then you want to iterate through them (I'm assuming there's only one trkseg tag per trk tag, and that you need to use the name space):

for trk in trk_objects:
    # xpath queries aways return a list of objects
    lat_objects = trk.xpath('./gpx:trkseg/gpx:trkpt/@lat', namespaces=ns)

    if lat_objects:
        lat = lat_objects[0].text

    lon_objects = trk.xpath('./gpx:trkseg/gpx:trkpt/@lon', namespace=ns)
    if lon_objects:
        lon = lon_objects[0].text

    time_objects = trk.xpath('./gpx:trkseg/gpx:time', namespace=ns)
    if time_objects:
        time = time_objects[0].text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM