简体   繁体   中英

Python 3: XML Tag Value not being written to csv file

My python 3 script takes an xml file and creates a csv file.

Small excerpt of xml file:

<?xml version="1.0" encoding="UTF-8" ?>
    <metadata>
    <dc>
    <title>Golden days for boys and girls, 1895-03-16, v. XVI #17</title>
    <subject>Children's literature--Children's periodicals</subject>
    <description>Archives &amp; Special Collections at the Thomas J. Dodd Research Center, University of Connecticut Libraries</description>
    <publisher>James Elverson, 1880-</publisher>
    <date>1895-06-15</date>
    <type>Text | periodicals</type>
    <format>image/jp2</format>
    <handle>http://hdl.handle.net/11134/20002:860074494</handle>
    <accessionNumber/>
    <barcode/>
    <identifier>20002:860074494 | local: 868010272 | local: 997186613502432 | local: 39153019382870 | hdl:  | http://hdl.handle.net/11134/20002:860074494</identifier>
    <rights>These Materials are provided for educational and research purposes only. The University of Connecticut Libraries hold the copyright except where noted. Permission must be obtained in writing from the University of Connecticut Libraries and/or theowner(s) of the copyright to publish reproductions or quotations beyond "fair use." | The collection is open and available for research.</rights>
    <creator/>
    <relation/>
    <coverage/>
    <language/>
    </dc>
    </metadata>

Python3 code:

import csv
import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='ctda_set1_uniqueTags.xml')
doc = ET.parse("ctda_set1_uniqueTags.xml")
root = tree.getroot()

oaidc_data = open('ctda_set1_uniqueTags.csv', 'w', encoding='utf-8')

titles = 'dc/title'
subjects = 'dc/subject'

csvwriter = csv.writer(oaidc_data)
oaidc_head = ['Title', 'Subject', 'Description', 'Publisher', 'Date', 'Type', 'Format', 'Handle', 'Accession Number', 'Barcode', 'Identifiers', 'Rights', 'Creator', 'Relation', 'Coverage', 'Language']

count = 0
for member in root.findall('dc'):
    if count == 0:
       csvwriter.writerow(oaidc_head)
       count = count + 1

    dcdata = []
    titles = member.find('title').text
    dcdata.append(titles)
    subjects = member.find('subject').text
    dcdata.append(subjects)
    descriptions = member.find('description').text
    dcdata.append(descriptions)
    publishers = member.find('publisher').text
    dcdata.append(publishers)
    dates = member.find('date').text
    dcdata.append(dates)
    types = member.find('type').text
    dcdata.append(types)
    formats = member.find('format').text
    dcdata.append(formats)
    handle = member.find('handle').text
    dcdata.append(handle)
    accessionNo = member.find('accessionNumber').text
    dcdata.append(accessionNo)
    barcodes = member.find('barcode').text
    dcdata.append(barcodes)
    identifiers = member.find('identifier').text
    dcdata.append(identifiers)
    rt = member.find('rights').text
    print(member.find('rights').text)
    dcdata.append('rt')
    ct = member.find('creator').text
    dcdata.append('ct')
    rt = member.find('relation').text
    dcdata.append('rt')
    ce = member.find('coverage').text
    dcdata.append('ce')
    lang = member.find('language').text
    dcdata.append('lang')
    csvwriter.writerow(dcdata)

oaidc_data.close()

Everything works as expected except for rt, ce, and lang. What happens is that in the csv, all the data is written with the comma delimiter. For rt, the value is always rt, for ce, ce, lang, lang, etc.

Here's a snippet of the output:

Title,Subject,Description,Publisher,Date,Type,Format,Handle,Accession Number,Barcode,Identifiers,Rights,Creator,Relation,Coverage,Language

"Golden days for boys and girls, 1895-03-16, v. XVI #17",Children's literature--Children's periodicals,"Archives & Special Collections at the Thomas J. Dodd Research Center, University of Connecticut Libraries","James Elverson, 1880-",1895-06-15,Text | periodicals,image/jp2,hdl.handle.net/11134/20002:860074494,,,20002:860074494 | local: 868010272 | local: 997186613502432 | local: 39153019382870,**rt,ct,rt,ce,lang**

Some of the rights statements get very long - perhaps that's the issue. That's why I added the print(member.find('rights')) to see the output. The text is printed just fine. The text just isn't written to the csv. What I'd like is to have the value or text written for these xml tags. Any help would be appreciated.

Thanks. Jennifer

In the line dcdata.append('rt') there is no need for the quotes. Try dcdata.append(rt) . Similarly, there are unnecessary quotes in the ce and lang lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM