简体   繁体   English

迭代 XPath 元素以获取单个元素而不是列表

[英]Iterate XPath elements to get individual elements instead of list

I'm parsing a XML document and reading values of different elements using XPath.我正在解析 XML 文档并使用 XPath 读取不同元素的值。 Currently this works well to get all elements in lists.目前,这可以很好地获取列表中的所有元素。 However, children elements are not always present for all parents (but are present in some!) and I need to know which as I'm parsing the xml to create a dataframe to insert in a database.但是,子元素并不总是存在于所有父元素中(但存在于某些父元素中!),我需要知道哪个,因为我正在解析 xml 以创建要插入数据库的数据框。 So I want to iterate over elements and grab the values I need one at a time.所以我想遍历元素并一次获取我需要的值。 I'm not sure how to do this as currently I'm getting the full list on each iteration.我不确定如何执行此操作,因为目前我正在获取每次迭代的完整列表。 I'm extracting elements that are nested at different levels.我正在提取嵌套在不同级别的元素。

The xml I'm parsing is a TCX file by Garmin.我正在解析的 xml 是 Garmin 的 TCX 文件。 Short example:简短示例:

 <?xml version="1.0" encoding="UTF-8"?>
<TrainingCenterDatabase
  xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd"
  xmlns:ns5="http://www.garmin.com/xmlschemas/ActivityGoals/v1"
  xmlns:ns3="http://www.garmin.com/xmlschemas/ActivityExtension/v2"
  xmlns:ns2="http://www.garmin.com/xmlschemas/UserProfile/v2"
  xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns4="http://www.garmin.com/xmlschemas/ProfileExtension/v1">
  <Activities>
    <Activity Sport="Running">
      <Id>2018-10-10T14:10:10.000Z</Id>
      <Lap StartTime="2018-10-10T14:10:10.000Z">
        <TotalTimeSeconds>343.0</TotalTimeSeconds>
        <DistanceMeters>1000.0</DistanceMeters>
        <MaximumSpeed>3.694999933242798</MaximumSpeed>
        <Calories>51</Calories>
        <AverageHeartRateBpm>
          <Value>136</Value>
        </AverageHeartRateBpm>
        <MaximumHeartRateBpm>
          <Value>162</Value>
        </MaximumHeartRateBpm>
        <Intensity>Active</Intensity>
        <TriggerMethod>Manual</TriggerMethod>
        <Track>
          <Trackpoint>
            <Time>2018-10-10T14:10:10.000Z</Time>
            <Position>
              <LatitudeDegrees>52.17917550355196</LatitudeDegrees>
              <LongitudeDegrees>6.532441098242998</LongitudeDegrees>
            </Position>
            <AltitudeMeters>-0.20000000298023224</AltitudeMeters>
            <DistanceMeters>0.0</DistanceMeters>
            <HeartRateBpm>
              <Value>94</Value>
            </HeartRateBpm>
            <Extensions>
              <ns3:TPX>
                <ns3:Speed>0.04699999839067459</ns3:Speed>
                <ns3:RunCadence>7</ns3:RunCadence>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>
          <Trackpoint>
            <Time>2018-10-10T14:10:11.000Z</Time>
            <Position>
              <LatitudeDegrees>52.17917634174228</LatitudeDegrees>
              <LongitudeDegrees>6.532444199547172</LongitudeDegrees>
            </Position>
            <AltitudeMeters>0.0</AltitudeMeters>
            <DistanceMeters>0.23000000417232513</DistanceMeters>
            <HeartRateBpm>
              <Value>95</Value>
            </HeartRateBpm>
            <Extensions>
              <ns3:TPX>
                <ns3:Speed>0.0</ns3:Speed>
                <ns3:RunCadence>7</ns3:RunCadence>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>
          <Trackpoint>
            <Time>2018-10-10T14:10:12.000Z</Time>
            <Position>
              <LatitudeDegrees>52.17917206697166</LatitudeDegrees>
              <LongitudeDegrees>6.532468926161528</LongitudeDegrees>
            </Position>
            <AltitudeMeters>0.0</AltitudeMeters>
            <DistanceMeters>1.9700000286102295</DistanceMeters>
            <Extensions>
              <ns3:TPX>
                <ns3:Speed>0.0</ns3:Speed>
                <ns3:RunCadence>7</ns3:RunCadence>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>
          <Trackpoint>
            <Time>2018-10-10T14:10:13.000Z</Time>
            <Position>
              <LatitudeDegrees>52.17916024848819</LatitudeDegrees>
              <LongitudeDegrees>6.5325202234089375</LongitudeDegrees>
            </Position>
            <AltitudeMeters>0.0</AltitudeMeters>
            <DistanceMeters>5.679999828338623</DistanceMeters>
            <HeartRateBpm>
              <Value>96</Value>
            </HeartRateBpm>
            <Extensions>
              <ns3:TPX>
                <ns3:Speed>0.08399999886751175</ns3:Speed>
                <ns3:RunCadence>7</ns3:RunCadence>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>
          <Trackpoint>
            <Time>2018-10-10T14:10:14.000Z</Time>
            <Position>
              <LatitudeDegrees>52.17914817854762</LatitudeDegrees>
              <LongitudeDegrees>6.532532041892409</LongitudeDegrees>
            </Position>
            <AltitudeMeters>0.0</AltitudeMeters>
            <DistanceMeters>7.150000095367432</DistanceMeters>
            <HeartRateBpm>
              <Value>98</Value>
            </HeartRateBpm>
            <Extensions>
              <ns3:TPX>
                <ns3:Speed>0.10300000011920929</ns3:Speed>
                <ns3:RunCadence>10</ns3:RunCadence>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>

Code that is working that gives me all values in the file as a list:正在运行的代码将文件中的所有值作为列表提供给我:

from lxml import etree, objectify
from os import listdir
from os.path import isfile, join

def tcxParse(tcxFile):
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(tcxFile, parser)
    root = tree.getroot()

    ####
    #strip namespaces
    for elem in root.getiterator():
        if not hasattr(elem.tag, 'find'): continue  # (1)
        i = elem.tag.find('}')
        if i >= 0:
            elem.tag = elem.tag[i + 1:]
    objectify.deannotate(root, cleanup_namespaces=True)
    ####
#check if we are dealing with .tcx or other format
    if tcxFile.lower().endswith('.tcx'):
        tcxParse.activity = tree.xpath('//*[@Sport]/@Sport')
        tcxParse.HR = list(map(int, tree.xpath('//Track/Trackpoint/HeartRateBpm/Value/text()')))
        tcxParse.Time = tree.xpath('//Time/text()')
        tcxParse.Speed = list(map(float, tree.xpath('//Track/Trackpoint/Extensions/TPX/Speed/text()')))
        tcxParse.Cadence = list(map(int, tree.xpath('//Track/Trackpoint/Extensions/TPX/RunCadence/text()')))
        tcxParse.Lat = list(map(float, tree.xpath('//Track/Trackpoint/Position/LatitudeDegrees/text()')))
        tcxParse.Lon = list(map(float, tree.xpath('//Track/Trackpoint/Position/LongitudeDegrees/text()')))
        tcxParse.Alt = list(map(float, tree.xpath('//Track/Trackpoint/AltitudeMeters/text()')))
        tcxParse.Distance = list(map(float, tree.xpath('//Track/Trackpoint/DistanceMeters/text()')))

I know I can use tree.iter() to iterate over the elements, but not sure how to grab the values one at a time instead of the full list.我知道我可以使用 tree.iter() 迭代元素,但不确定如何一次获取一个值而不是完整列表。

To be clear: Current output for tcxParse.HR for instance would be:需要明确的是:例如 tcxParse.HR 的当前输出将是:

94,95,96,98

But I need it to be但我需要它

94,95,nan,96,98 

as the HeartRateBpm is missing in the 3rd Trackpoint element因为第三个跟踪点元素中缺少 HeartRateBpm

As I understand you need to iterate <Trackpoint> 's in <Track> .据我了解,您需要在<Track>迭代<Trackpoint> <Track>
I propose you to do it like this:我建议你这样做:

trackpoints = [{
    'HR': tp.findtext('HeartRateBpm/Value'),
    'Time': tp.findtext('Time'),
    'Speed': tp.findtext('Extensions/TPX/Speed'),
    'Cadence': tp.findtext('Extensions/TPX/RunCadence'),
    'Lat': tp.findtext('Position/LatitudeDegrees'),
    'Lon': tp.findtext('Position/LongitudeDegrees'),
    'Alt': tp.findtext('AltitudeMeters'),
    'Distance': tp.findtext('DistanceMeters')
    }
for tp in tree.xpath('//Track/Trackpoint')]

For xml chunk in question (with deleted <HeartRateBpm> in second <Trackpoint> ) - trackpoints will contain such list:对于有问题的 xml 块(在第二个<Trackpoint>删除了<HeartRateBpm> <Trackpoint> ) - trackpoints将包含这样的列表:

[{'HR': '94', 'Time': '2018-10-10T14:10:10.000Z', 'Speed': '0.04699999839067459', 'Cadence': '7', 'Lat': '52.17917550355196', 'Lon': '6.532441098242998', 'Alt': '-0.20000000298023224', 'Distance': '0.0'}, 
 {'HR': None, 'Time': '2018-10-10T14:10:11.000Z', 'Speed': '0.0', 'Cadence': '7', 'Lat': '52.17917634174228', 'Lon': '6.532444199547172', 'Alt': '0.0', 'Distance': '0.23000000417232513'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM