I have parse function which is parsing tree of categories. I've written it in simplest way possible and now struggling with refactoring it.
Every nested loop is doing the same stuff but appending object to object childs initialized at the top.
I think it's possible to refactor it with recursion but I'm struggling with it. How to wrap it in recursion function to prevent code duplication?
Final result should be a list of objects or just yield top level object with nested childs.
for container in category_containers:
root_category_a = container.xpath("./a")
root_category_title = root_category_a.xpath("./*[1]/text()").get()
root_category_url = self._host + root_category_a.xpath("./@href").get()
root = {
"title": root_category_title,
"url": root_category_url,
"childs": [],
}
subcategory_rows1 = container.xpath("./div/div")
for subcat_row1 in subcategory_rows1:
subcategory_a = subcat_row1.xpath("./a")
subcategory_title = subcategory_a.xpath("./*[1]/text()").get()
subcategory_url = self._host + subcategory_a.xpath("./@href").get()
subcat1 = {
"title": subcategory_title,
"url": subcategory_url,
"childs": [],
}
subcategory_rows2 = subcat_row1.xpath("./div/div")
for subcat_row2 in subcategory_rows2:
subcategory2_a = subcat_row2.xpath("./a")
subcategory2_title = subcategory2_a.xpath("./*[1]/text()").get()
subcategory2_url = self._host + subcategory2_a.xpath("./@href").get()
subcat2 = {
"title": subcategory2_title,
"url": subcategory2_url,
"childs": [],
}
subcategory_rows3 = subcat_row2.xpath("./div/div")
for subcat_row3 in subcategory_rows3:
subcategory3_a = subcat_row3.xpath("./a")
subcategory3_title = subcategory3_a.xpath("./*[1]/text()").get()
subcategory3_url = self._host + subcategory3_a.xpath("./@href").get()
subcat3 = {
"title": subcategory3_title,
"url": subcategory3_url,
"childs": [],
}
subcat2['childs'].append(subcat3)
subcat1['childs'].append(subcat2)
root['childs'].append(subcat1)
yield root
It is generally done as follows:
From now on, you can focus on how to rewrite the function to avoid side-effects, but for the concept implementation, this one is much better.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.