简体   繁体   English

删除python中的xml字符

[英]Remove xml character in python

I have the following code, which takes information from an XML file and saves some data in a csv file. 我有以下代码,该代码从XML文件获取信息并将一些数据保存在csv文件中。

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('file.xml')
root = tree.getroot()

title = []
category = []
url = []
prod = []

def find_title():
    for t in root.findall('solution/head'):
        title.append(t.find('title').text)

    for c in root.findall('solution/body'):
        category.append(c.find('category').text)

    for u in root.findall('solution/body'):
        url.append(u.find('video').text)

    for p in root.findall('solution/body'):
        prod.append(p.find('product').text)

find_title()

headers = ['Title', 'Category', 'Video URL','Product']

def save_csv():
    with open('titles.csv', 'w') as f:
        f_csv = csv.writer(f, lineterminator='\r')
        f_csv.writerow(headers)
        f.write(''.join('{},{},{},{}\n'.format(title, category, url, prod) for title, category, url, prod in zip(title, category, url, prod)))

save_csv()

I have found an issue with the text that contains ',' because it separates the output save in the list eg: 我发现包含','的文本存在问题,因为它会将输出保存在列表中,例如:

<title>Add, Change, or Remove Transitions between Slides</title>

is getting save in the list as [Add, Change, or Remove Transitions between Slides] which make sense since this is a csv file, however, I would like to keep the whole output together. 正以[添加,更改或删除幻灯片之间的过渡]的形式保存在列表中,这是有道理的,因为这是一个csv文件,但是,我希望将整个输出保持在一起。

So I there any way to remove the ',' from the title tag or can I add more code to override the ',' 因此,我有任何方法可以从标题标签中删除“,”,也可以添加更多代码以覆盖“,”

Thanks in advance 提前致谢

It's not clear why you're writing the row data with a file.write() call rather than using the csv writer's writerow method (which you are using for the header row. Using that method will take care of quoting / special character issues wrt. data containing quotes and commas. 目前尚不清楚为什么要使用file.write()调用而不是使用csv writer的writerow方法(用于标题行file.write()写入行数据。使用该方法将处理引号/特殊字符问题。包含引号和逗号的数据。

Change: 更改:

f.write(''.join('{},{},{},{}\n'.format(title, category, url, prod) for title, category, url, prod in zip(title, category, url, prod)))

to: 至:

for row in zip(title, category, url, prod):
    f_csv.writerow(row)

and your CSV should work as expected, assuming your CSV reader handles the quoted fields. 并且假设您的CSV阅读器可以处理引用的字段,那么CSV应该可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM