简体   繁体   中英

Python recursion with nested loops

I have parse function which is parsing tree of categories. I've written it in simplest way possible and now struggling with refactoring it.

Every nested loop is doing the same stuff but appending object to object childs initialized at the top.

I think it's possible to refactor it with recursion but I'm struggling with it. How to wrap it in recursion function to prevent code duplication?

Final result should be a list of objects or just yield top level object with nested childs.

for container in category_containers:
            root_category_a = container.xpath("./a")
            root_category_title = root_category_a.xpath("./*[1]/text()").get()
            root_category_url = self._host + root_category_a.xpath("./@href").get()

            root = {
                "title": root_category_title,
                "url": root_category_url,
                "childs": [],
            }

            subcategory_rows1 = container.xpath("./div/div")

            for subcat_row1 in subcategory_rows1:
                subcategory_a = subcat_row1.xpath("./a")
                subcategory_title = subcategory_a.xpath("./*[1]/text()").get()
                subcategory_url = self._host + subcategory_a.xpath("./@href").get()

                subcat1 = {
                    "title": subcategory_title,
                    "url": subcategory_url,
                    "childs": [],
                }

                subcategory_rows2 = subcat_row1.xpath("./div/div")

                for subcat_row2 in subcategory_rows2:
                    subcategory2_a = subcat_row2.xpath("./a")
                    subcategory2_title = subcategory2_a.xpath("./*[1]/text()").get()
                    subcategory2_url = self._host + subcategory2_a.xpath("./@href").get()
                    subcat2 = {
                        "title": subcategory2_title,
                        "url": subcategory2_url,
                        "childs": [],
                    }

                    subcategory_rows3 = subcat_row2.xpath("./div/div")

                    for subcat_row3 in subcategory_rows3:
                        subcategory3_a = subcat_row3.xpath("./a")
                        subcategory3_title = subcategory3_a.xpath("./*[1]/text()").get()
                        subcategory3_url = self._host + subcategory3_a.xpath("./@href").get()
                        subcat3 = {
                            "title": subcategory3_title,
                            "url": subcategory3_url,
                            "childs": [],
                        }

                        subcat2['childs'].append(subcat3)

                    subcat1['childs'].append(subcat2)

                root['childs'].append(subcat1)

            yield root

It is generally done as follows:

  1. Prepare a container (list) to collect the parsed objects
  2. Pass a reference to that object to the parsing function
  3. Define the function as follows:
  • The function accepts a category object and a reference to a container.
  • Parse the category object and add it into the container using the reference.
  • Iterate over all direct subcategories of that category object and call the function itself on each of it, using the same container reference you got as a parameter.
  • You do not have to return anything, as the function fills the input container as a side-effect.

From now on, you can focus on how to rewrite the function to avoid side-effects, but for the concept implementation, this one is much better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM