簡體   English   中英

python etree使用html實體解析xml(保持html格式)

[英]python etree parse xml with html entities ( keep html formatting )

我有以下xml:

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:metadata="http://xmlns.escenic.com/2010/atom-metadata">
 <content type="application/vnd.vizrt.payload+xml">
    <vdf:payload xmlns:vdf="http://www.vizrt.com/types">
      <vdf:field name="body">
        <vdf:value>

          <div xmlns="http://www.w3.org/1999/xhtml">
            <p>I saluti dal Sud partono con <strong>Elsa Albonico</strong>, storica  "golosit&#xE0;", con i pi&#xF9; piccoli "fare le conte".</p>
            <p>I saluti dal Nord la <a href="http://www.proticino.ch/sezioni-in-svizzera/basilea/">Pro Ticino di Basilea</a> con un particolarit&#xE0; frammenti&#xA0;&#xA0; </p>
            <p><a href="https://www.rts.ch/">RTS</a> "Kiosque &#xE0; Musiques" con <strong>Jean-Marc Richard</strong>. <br/>A fare da<em> fil&#xA0;rouge</em> al nostro </p>
            <p>
              <a href="http://internal.publishing.production.rsi.ch/webservice/escenic/content/8762014" id="_360b1131-e6a5-49b6-995e-a624c888617a">Le foto del gioco, Finestra popolare 26.02.2017</a>
            </p>
          </div>

        </vdf:value>
      </vdf:field>
    </vdf:payload>
  </content>
 </entry>

“body”字段是我必須以html格式復制到另一個文件的HTML(因此不允許替換或其他技巧)

我正在使用python和eTree。

有沒有辦法做到這一點 ?

我已經嘗試使用尾部而不是文本,但我正在丟失HTML的格式,這是一個大問題。

請幫忙。

謝謝

CP

這是一個非常難看的解決方案,但有效! 作為家庭作業,讓它變得更好!

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:metadata="http://xmlns.escenic.com/2010/atom-metadata">
 <content type="application/vnd.vizrt.payload+xml">
    <vdf:payload xmlns:vdf="http://www.vizrt.com/types">
      <vdf:field name="body">
        <vdf:value>

          <div xmlns="http://www.w3.org/1999/xhtml">
            <p>I saluti dal Sud partono con <strong>Elsa Albonico</strong>, storica  "golosit&#xE0;", con i pi&#xF9; piccoli "fare le conte".</p>
            <p>I saluti dal Nord la <a href="http://www.proticino.ch/sezioni-in-svizzera/basilea/">Pro Ticino di Basilea</a> con un particolarit&#xE0; frammenti&#xA0;&#xA0; </p>
            <p><a href="https://www.rts.ch/">RTS</a> "Kiosque &#xE0; Musiques" con <strong>Jean-Marc Richard</strong>. <br/>A fare da<em> fil&#xA0;rouge</em> al nostro </p>
            <p>
              <a href="http://internal.publishing.production.rsi.ch/webservice/escenic/content/8762014" id="_360b1131-e6a5-49b6-995e-a624c888617a">Le foto del gioco, Finestra popolare 26.02.2017</a>
            </p>
          </div>

        </vdf:value>
      </vdf:field>
    </vdf:payload>
  </content>
 </entry>'''

tree = ET.fromstring(data)
div = tree.getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0]

with open('./result.html', 'w') as html:
    html.writelines([i for i in div.itertext()])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM