简体   繁体   中英

reading and parsing nested XML tags and save to CSV using Python, Unable to read the nested tag

I've looked and tried multiple things however have not been able to read the nested tags in my XML file. I have extracted the outter tags value and not the nested street and city tags under the address tag. I'm on a time crunch and I'm unable to read the nested tags after trying a bunch of things. PLEASE HELP!!!

The expected outcome I'm trying to get it is ---->

Common Botanical zone light price Street City

bloodroot Sanguinaria canadensis 4 mostly shady 2.44 1 toronto

And so on ---->

However, I have not been able to retrieve the street and city columns because my code is not picking up the nested tags.

I have been able to achieve the following output by removing the code involving city and street tags.

Common Botanical zone light price
bloodroot Sanguinaria canadensis 4 mostly shady 2.44

The following is my xml file with 2 entries only for testing purposes. I'm trying to create columns of each text information under the plant tag also mentioned above. I'm reading using the databricks file system. I open and create a csv and write to it and then close it. The indentation is correct, it may have been mixed up when I was copy pasting.

<?xml version="1.0" encoding="UTF-8"?>

<CATALOG>

  <PLANT>

    <COMMON>Bloodroot</COMMON>

    <BOTANICAL>Sanguinaria canadensis</BOTANICAL>

    <ZONE>4</ZONE>

    <LIGHT>Mostly Shady</LIGHT>

    <PRICE>$2.44</PRICE>

    <ADDRESS>

         <STREET>1</STREET>

         <CITY>toronto</CITY>

    </ADDRESS>

    <AVAILABILITY>031599</AVAILABILITY>

  </PLANT>

  <PLANT>

    <COMMON>Columbine</COMMON>

    <BOTANICAL>Aquilegia canadensis</BOTANICAL>

    <ZONE>3</ZONE>

    <LIGHT>Mostly Shady</LIGHT>

    <PRICE>$9.37</PRICE>

    <ADDRESS>

         <STREET>2</STREET>

         <CITY>montreal</CITY>

    </ADDRESS>

    <AVAILABILITY>030699</AVAILABILITY>

  </PLANT>

</CATALOG>


-----------This is the code I have used ---------------
from xml.etree import ElementTree

import csv

import os


xml = ElementTree.parse("/dbfs/mnt/ods-outbound/xml_test/plant_catalog.xml")

#creating a file

csvfile= open("/dbfs/mnt/ods-outbound/xml_test/plant_catalog.csv",'w',encoding='utf-8')

csvfile_writer=csv.writer(csvfile)

# ADD THE HEADER TO CSV FILE

csvfile_writer.writerow(["common","botanical","zone","light","price","availability","street","city"])

# FOR EACH PLANT

for plant in xml.findall("PLANT"):
    if(plant)

      # EXTRACT PLANT DETAILS 

      common = plant.find("COMMON")

      botanical = plant.find("BOTANICAL")

      zone = plant.find("ZONE")

      light = plant.find("LIGHT")

      price = plant.find("PRICE")

      availability = plant.find("AVAILABILITY")

      street = plant.find("STREET")

      city = plant.find("CITY")

      csv_line = [common.text, botanical.text, zone.text, light.text, price.text, availability.text,street.text,city.text]

      # ADD A NEW ROW TO CSV FILE

      csvfile_writer.writerow(csv_line)

csvfile.close() 

As per the xml file, the street and city values are inside address tag.

changes :

  • Find the address tag in page
  • if address tag then find street and city

a.xml (input file):

<?xml version="1.0" encoding="UTF-8" ?>
<CATALOG>
  <PLANT>
    <COMMON>Bloodroot</COMMON>
    <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
    <ZONE>4</ZONE>
    <LIGHT>Mostly Shady</LIGHT>
    <PRICE>$2.44</PRICE>
    <ADDRESS>
         <STREET>1</STREET>
         <CITY>toronto</CITY>
    </ADDRESS>
    <AVAILABILITY>031599</AVAILABILITY>
  </PLANT>
  <PLANT>
    <COMMON>Columbine</COMMON>
    <BOTANICAL>Aquilegia canadensis</BOTANICAL>
    <ZONE>3</ZONE>
    <LIGHT>Mostly Shady</LIGHT>
    <PRICE>$9.37</PRICE>
    <ADDRESS>
         <STREET>2</STREET>
         <CITY>montreal</CITY>
    </ADDRESS>
    <AVAILABILITY>030699</AVAILABILITY>
  </PLANT>
</CATALOG>

main.py :

import csv
from xml.etree import ElementTree

xml = ElementTree.parse("a.xml")

csvfile= open("plant_catalog.csv",'w',encoding='utf-8',newline="")
csvfile_writer=csv.writer(csvfile)
csvfile_writer.writerow(["common","botanical","zone","light","price","availability","street","city"])

for plant in xml.findall("PLANT"):
    if(plant):
      common = plant.find("COMMON").text
      botanical = plant.find("BOTANICAL").text
      zone = plant.find("ZONE").text
      light = plant.find("LIGHT").text
      price = plant.find("PRICE").text
      availability = plant.find("AVAILABILITY").text
      for addr in plant.findall("ADDRESS"):
        if(addr):
          street = addr.find("STREET").text
          city = addr.find("CITY").text
      csv_line = [common, botanical, zone, light, price, availability,street,city]

      csvfile_writer.writerow(csv_line)
csvfile.close() 

plant_catalog.csv (output file):

common,botanical,zone,light,price,availability,street,city
Bloodroot,Sanguinaria canadensis,4,Mostly Shady,$2.44,031599,1,toronto
Columbine,Aquilegia canadensis,3,Mostly Shady,$9.37,030699,2,montreal

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM