简体   繁体   English

使用递归与Python进行XML解析。 返回值问题

[英]Xml parsing with Python using recursion. Problem with return value

I am somewhat new to Python and programming in general so I apologize. 我对Python和一般编程还是有些陌生,所以我深表歉意。 By the way, thanks in advance. 顺便说一句,在此先感谢。

I am parsing an xml document (kml specifically which is used in Google Earth) using Python 2.5, cElementTree and expat. 我正在使用Python 2.5,cElementTree和expat解析xml文档(专门用于Google Earth的kml)。 I am trying to pull out all the text from the 'name', 'description' and 'coordinates' nodes inside each 'placemark' node for each geometry type (ie polylines, polygon, point), but I want to keep the geometry types separate. 我试图从每种几何类型(即折线,多边形,点)的每个“地标”节点中的“名称”,“描述”和“坐标”节点中提取所有文本,但我想保留这些几何类型分离。 For example, I want only the 'name','description', and 'coordinates' text for every placemark that is part of a 'polygon' (ie it has a 'polygon' node). 例如,对于作为“多边形”一部分(即它具有“多边形”节点)的每个地标,我只需要“名称”,“描述”和“坐标”文本。 I will need to do this for 'polylines' and 'points' also. 我还需要针对“折线”和“点”执行此操作。 I have figured out a way to do this, but the code is long a verbose and specific to each geometry type, which leads me to my question. 我已经找到了一种方法,但是代码很冗长,并且特定于每种几何类型,这导致了我的问题。

Ideally, I would like to use the same code for each geometry type, but the problem is that each geometry type has a different node structure (ie different node names and number of nested nodes). 理想情况下,我想对每种几何类型使用相同的代码,但是问题是每种几何类型具有不同的节点结构(即,不同的节点名称和嵌套节点的数量)。 So for proof of concept I thought this would be a good opportunity to use/learn recursion to drill down the node tree of 'placemark' node and get the information I was looking for. 因此,为了概念验证,我认为这将是使用/学习递归来深入研究“地标”节点的节点树并获取我正在寻找的信息的好机会。 I have looked at the many posts on Python recursion and am still having problems with implementing the solutions provided. 我看过许多有关Python递归的文章,但在实现所提供的解决方案时仍然遇到问题。

The sample xml for a 'placemark' node is: “地标”节点的示例xml是:

 <Placemark>
    <name>testPolygon</name>
    <description>polygon text</description>
    <styleUrl>#msn_ylw-pushpin</styleUrl>
    <Polygon>
            <tessellate>1</tessellate>
            <outerBoundaryIs>
                    <LinearRing>
                            <coordinates>
                                    -81.4065,31.5072,0 -81.41269,31.45992,0 -81.34490,31.459696,0 
                            </coordinates>
                    </LinearRing>
            </outerBoundaryIs>
    </Polygon>
 </Placemark>

The recursion function I am using is: 我正在使用的递归函数是:

def getCoords( child, searchNode ):

    # Get children of node
    children = child.getchildren()

    # If node has one or more child
    if len( children ) >= 1 :

        # Loop through all the children
        for child in children:

            # call to recursion function
            getCoords( child, searchNode )

    # If does not have children and is the 'searchNode'
    elif len( children ) == 0 and child.tag == searchNode:

        # Return the text inside the node. This is where it is not working    
        # Other posts recommended returning the function like 
        # return getCoords(child, searchNode), but I am getting an unending loop
        return child.text

    # Do nothing if node doesn't have children and does not match 'searchNode'    
    else: 

        print 'node does not have children and is not what we are looking for'

I am calling the recursion function like: 我正在调用递归函数,如:

searchNode = 'coordinates'

# loop through all 'Placemark nodes' in document
for mark in placemark:

    # Get children of 'Placemark' node
    children = mark.getchildren() 

    # Loop through children nodes
    for child in children:

        # if a 'Polygon' node is found
        if child.tag == 'Polygon':

            # call recursion function
            getCoords( child, searchNode)

I realize, at least, part of my problem is the return value. 我至少知道部分问题是返回值。 Other posts recommended returning the function, which I interpreted to be 'return getCoords(child, searchNode), but I am getting an unending loop. 其他帖子建议返回该函数,我将其解释为“返回getCoords(child,searchNode),但我遇到了无休止的循环。 Also, I realize this could be posted on the GIS site, but I think this is more of a general programming question. 另外,我意识到这可以发布在GIS网站上,但是我认为这更多是一个通用的编程问题。 Any ideas? 有任何想法吗?

With recursion you want to pay attention to your base cases, and your recursive cases. 使用递归时,您要注意基本情况和递归情况。 Whatever your base cases happen to be, if you expect to be able to collect information from your recursion, they have to return data that your recursive cases can (and more importantly do) use. 无论您遇到的基本案例是什么,如果您希望能够从递归中收集信息,则它们必须返回递归案例可以(并且更重要的是可以使用)的数据。 Similarly you need to make sure the data your recursive cases return can be used by each other. 同样,您需要确保递归案例返回的数据可以相互使用。

First identify your base and recursive cases. 首先确定您的基础案例和递归案例。 The base cases are the "leaf" nodes, with no children. 基本情况是“叶”节点,没有子节点。 In a base case you want to just return some data, and not call the recursive function again. 在基本情况下,您只想返回一些数据,而不要再次调用递归函数。 This is what allows you to get "back up the stack" as they say, and prevent infinite recursion. 这就是使您能够像他们所说的那样“备份堆栈”并防止无限递归的方法。 The recursive cases will require you to save the data collected from a series of recursive calls, which is almost what you're doing in your for loop. 递归情况将需要您保存从一系列递归调用中收集的数据,这几乎是您在for循环中所做的事情。

I noticed that you have 我注意到你有

# Recursive case: node has one or more child
if len( children ) >= 1 :
    # Loop through all the children
    for child in children:
        # call to recursion function
        getCoords( child, searchNode )

but what are you doing with the results of your getCoords calls? 但是您如何处理getCoords调用的结果?

You either want to save the results in some sort of a data structure which you can return at the end of your for loop, or if you're not interested in saving the results themselves, just print your base case 1 ( successful search ) when you reach it instead of returning it. 您或者希望将结果保存在某种数据结构中,然后可以在for循环结束时返回,或者如果您对保存结果本身不感兴趣,只需在出现以下情况时打印基本情况1(成功搜索):您到达它而不是返回它。 Because now your base case 1 is just returning up the stack to an instance that isn't doing anything with the result! 因为现在您的基本情况1只是将堆栈返回到对结果不做任何事的实例! So try: 因此,请尝试:

# If node has one or more child
if len( children ) >= 1 :
    # Data structure for your results
    coords = []
    # Loop through all the children
    for child in children:
        # call to recursion function
        result = getCoords( child, searchNode )
        # Add your new results together
        coords.extend(result)
    # Give the next instance up the stack your results!
    return coords

Now since your results are in a list and you're using the extend() method you've got to make your base cases return lists as well! 现在,由于您的结果在列表中,并且使用了extend()方法,因此还必须使基本案例也返回列表!

# Base case 1: does not have children and is the 'searchNode'
elif len( children ) == 0 and child.tag == searchNode:
    # Return the text from the node, inside a list
    return [child.text]
# Base case 2: doesn't have children and does not match 'searchNode'
else:
    # Return empty list so your extend() function knows what to do with the result
    return []

This should just give you a single list in the end, which you'll probably want to store in a variable. 最后应该只给您一个列表,您可能希望将其存储在变量中。 I've just printed the results here: 我刚刚在这里打印了结果:

searchNode = 'coordinates'
# loop through all 'Placemark nodes' in document
for mark in placemark:
    # Get children of 'Placemark' node
    children = mark.getchildren()
    # I imagine that getchildren() might return None, so check it
    # otherwise you'll get an error when trying to iterate on it
    if children:
        # Loop through children nodes
        for child in children:
            # if a 'Polygon' node is found
            if child.tag == 'Polygon':
                # call recursion function and print (or save) result
                print getCoords( child, searchNode)

You're not doing anything with the result of the recursion call when the node is the searchNode. 当节点是searchNode时,您对递归调用的结果不做任何事情。

You need to aggregate the results of recursive calls to the children of a node or just use print child.text instead of return child.text. 您需要聚合对节点子节点的递归调用的结果,或者只使用print child.text而不是return child.text。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM