I've looked and tried multiple things however have not been able to read the nested tags in my XML file. I have extracted the outter tags value and not the nested street and city tags under the address tag. I'm on a time crunch and I'm unable to read the nested tags after trying a bunch of things. PLEASE HELP!!!
The expected outcome I'm trying to get it is ---->
Common Botanical zone light price Street City
bloodroot Sanguinaria canadensis 4 mostly shady 2.44 1 toronto
And so on ---->
However, I have not been able to retrieve the street and city columns because my code is not picking up the nested tags.
I have been able to achieve the following output by removing the code involving city and street tags.
Common Botanical zone light price
bloodroot Sanguinaria canadensis 4 mostly shady 2.44
The following is my xml file with 2 entries only for testing purposes. I'm trying to create columns of each text information under the plant tag also mentioned above. I'm reading using the databricks file system. I open and create a csv and write to it and then close it. The indentation is correct, it may have been mixed up when I was copy pasting.
<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<ADDRESS>
<STREET>1</STREET>
<CITY>toronto</CITY>
</ADDRESS>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
<PLANT>
<COMMON>Columbine</COMMON>
<BOTANICAL>Aquilegia canadensis</BOTANICAL>
<ZONE>3</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$9.37</PRICE>
<ADDRESS>
<STREET>2</STREET>
<CITY>montreal</CITY>
</ADDRESS>
<AVAILABILITY>030699</AVAILABILITY>
</PLANT>
</CATALOG>
-----------This is the code I have used ---------------
from xml.etree import ElementTree
import csv
import os
xml = ElementTree.parse("/dbfs/mnt/ods-outbound/xml_test/plant_catalog.xml")
#creating a file
csvfile= open("/dbfs/mnt/ods-outbound/xml_test/plant_catalog.csv",'w',encoding='utf-8')
csvfile_writer=csv.writer(csvfile)
# ADD THE HEADER TO CSV FILE
csvfile_writer.writerow(["common","botanical","zone","light","price","availability","street","city"])
# FOR EACH PLANT
for plant in xml.findall("PLANT"):
if(plant)
# EXTRACT PLANT DETAILS
common = plant.find("COMMON")
botanical = plant.find("BOTANICAL")
zone = plant.find("ZONE")
light = plant.find("LIGHT")
price = plant.find("PRICE")
availability = plant.find("AVAILABILITY")
street = plant.find("STREET")
city = plant.find("CITY")
csv_line = [common.text, botanical.text, zone.text, light.text, price.text, availability.text,street.text,city.text]
# ADD A NEW ROW TO CSV FILE
csvfile_writer.writerow(csv_line)
csvfile.close()
As per the xml file, the street and city values are inside address tag.
changes
:
a.xml
(input file):
<?xml version="1.0" encoding="UTF-8" ?>
<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<ADDRESS>
<STREET>1</STREET>
<CITY>toronto</CITY>
</ADDRESS>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
<PLANT>
<COMMON>Columbine</COMMON>
<BOTANICAL>Aquilegia canadensis</BOTANICAL>
<ZONE>3</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$9.37</PRICE>
<ADDRESS>
<STREET>2</STREET>
<CITY>montreal</CITY>
</ADDRESS>
<AVAILABILITY>030699</AVAILABILITY>
</PLANT>
</CATALOG>
main.py
:
import csv
from xml.etree import ElementTree
xml = ElementTree.parse("a.xml")
csvfile= open("plant_catalog.csv",'w',encoding='utf-8',newline="")
csvfile_writer=csv.writer(csvfile)
csvfile_writer.writerow(["common","botanical","zone","light","price","availability","street","city"])
for plant in xml.findall("PLANT"):
if(plant):
common = plant.find("COMMON").text
botanical = plant.find("BOTANICAL").text
zone = plant.find("ZONE").text
light = plant.find("LIGHT").text
price = plant.find("PRICE").text
availability = plant.find("AVAILABILITY").text
for addr in plant.findall("ADDRESS"):
if(addr):
street = addr.find("STREET").text
city = addr.find("CITY").text
csv_line = [common, botanical, zone, light, price, availability,street,city]
csvfile_writer.writerow(csv_line)
csvfile.close()
plant_catalog.csv
(output file):
common,botanical,zone,light,price,availability,street,city
Bloodroot,Sanguinaria canadensis,4,Mostly Shady,$2.44,031599,1,toronto
Columbine,Aquilegia canadensis,3,Mostly Shady,$9.37,030699,2,montreal
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.