简体   繁体   English

从 minidom/getElementsByTagName 到 lxml/xpath

[英]From minidom/getElementsByTagName to lxml/xpath

I'm trying to parse a lot of different xml/gpx files to get lat/lon pairs that are an attribute of the node trkpt.我正在尝试解析许多不同的 xml/gpx 文件以获取作为节点 trkpt 属性的纬度/经度对。 I have a working minidom version, but i want to try and have a similar version using lxml and xpath to check if it is faster.我有一个工作的 minidom 版本,但我想尝试使用 lxml 和 xpath 来检查它是否更快。

Here is sample xml:这是示例 xml:

xml = '''<gpx xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" version="1.1" xmlns="http://www.topografix.com/GPX/1/1">
 <metadata>
  <time>2015-12-24T12:00:00Z</time>
 </metadata>
 <trk>
  <name>Track 1</name>
  <trkseg>
   <trkpt lat="42.00080" lon="2.79610">
    <ele>39.5</ele>
    <time>2015-12-24T12:00:00Z</time>
   </trkpt>
   <trkpt lat="42.99930" lon="2.79010">
    <ele>39.5</ele>
    <time>2015-12-24T12:01:00Z</time>
   </trkpt>
  </trkseg>
 </trk>
</gpx>
'''

This is the minidom version:这是迷你版:

from xml.dom import minidom
minitree = minidom.parseString(xml)
trkpt = minitree.getElementsByTagName('trkpt')

for elem in trkpt:
    print(elem.attributes['lat'].value + ', ' + elem.attributes['lon'].value)

Output: Output:

42.00080, 2.79610
42.99930, 2.79010

Now trying to replicate the exact same thing I used XMLQuire to learn that the xpath to my desired attributes would be dft:trk/dft:trkseg/dft:trkpt/@lat so i came up with this so far:现在尝试复制完全相同的东西,我使用 XMLQuire 了解到 xpath 到我想要的属性将是dft:trk/dft:trkseg/dft:trkpt/@lat所以到目前为止我想出了这个:

lxtree = etree.fromstring(xml)
trkpt = lxtree.xpath('dft:trk/dft:trkseg/dft:trkpt', namespaces={'dft': 'http://www.topografix.com/GPX/1/1'})

for elem in trkpt:
    print(trkpt[@lat] + ', ' + trpkt[@lon])

The output is nothing or rather that my print statement is wrong. output 什么都不是,或者更确切地说,我的打印语句是错误的。 But I can't tell because a check with print(type(trkpt), len(trkpt), trkpt) tells me: <class 'list'> 0 [] So the list is empty from the getgo.但我不知道,因为print(type(trkpt), len(trkpt), trkpt)的检查告诉我: <class 'list'> 0 []所以列表从一开始就是空的。 Can someone help me see the error?有人可以帮我看看错误吗?

Use elem.get() to get the value of an attribute.使用elem.get()获取属性的值。

lxtree = etree.fromstring(xml)
trkpt = lxtree.xpath('dft:trk/dft:trkseg/dft:trkpt', namespaces={'dft': 'http://www.topografix.com/GPX/1/1'})

for elem in trkpt:
    print(elem.get("lat") + ', ' + elem.get("lon"))

Result:结果:

42.00080, 2.79610
42.99930, 2.79010

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM