[英]python etree parse xml with html entities ( keep html formatting )
我有以下xml:
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:metadata="http://xmlns.escenic.com/2010/atom-metadata">
<content type="application/vnd.vizrt.payload+xml">
<vdf:payload xmlns:vdf="http://www.vizrt.com/types">
<vdf:field name="body">
<vdf:value>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>I saluti dal Sud partono con <strong>Elsa Albonico</strong>, storica "golosità", con i più piccoli "fare le conte".</p>
<p>I saluti dal Nord la <a href="http://www.proticino.ch/sezioni-in-svizzera/basilea/">Pro Ticino di Basilea</a> con un particolarità frammenti   </p>
<p><a href="https://www.rts.ch/">RTS</a> "Kiosque à Musiques" con <strong>Jean-Marc Richard</strong>. <br/>A fare da<em> fil rouge</em> al nostro </p>
<p>
<a href="http://internal.publishing.production.rsi.ch/webservice/escenic/content/8762014" id="_360b1131-e6a5-49b6-995e-a624c888617a">Le foto del gioco, Finestra popolare 26.02.2017</a>
</p>
</div>
</vdf:value>
</vdf:field>
</vdf:payload>
</content>
</entry>
“body”字段是我必須以html格式復制到另一個文件的HTML(因此不允許替換或其他技巧)
我正在使用python和eTree。
有沒有辦法做到這一點 ?
我已經嘗試使用尾部而不是文本,但我正在丟失HTML的格式,這是一個大問題。
請幫忙。
謝謝
CP
這是一個非常難看的解決方案,但有效! 作為家庭作業,讓它變得更好!
import xml.etree.ElementTree as ET
data = '''<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:metadata="http://xmlns.escenic.com/2010/atom-metadata">
<content type="application/vnd.vizrt.payload+xml">
<vdf:payload xmlns:vdf="http://www.vizrt.com/types">
<vdf:field name="body">
<vdf:value>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>I saluti dal Sud partono con <strong>Elsa Albonico</strong>, storica "golosità", con i più piccoli "fare le conte".</p>
<p>I saluti dal Nord la <a href="http://www.proticino.ch/sezioni-in-svizzera/basilea/">Pro Ticino di Basilea</a> con un particolarità frammenti   </p>
<p><a href="https://www.rts.ch/">RTS</a> "Kiosque à Musiques" con <strong>Jean-Marc Richard</strong>. <br/>A fare da<em> fil rouge</em> al nostro </p>
<p>
<a href="http://internal.publishing.production.rsi.ch/webservice/escenic/content/8762014" id="_360b1131-e6a5-49b6-995e-a624c888617a">Le foto del gioco, Finestra popolare 26.02.2017</a>
</p>
</div>
</vdf:value>
</vdf:field>
</vdf:payload>
</content>
</entry>'''
tree = ET.fromstring(data)
div = tree.getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0]
with open('./result.html', 'w') as html:
html.writelines([i for i in div.itertext()])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.