新增中 <root> 使用Python標記為XML文檔

Question

嘗試在200萬行XML文件的開頭和結尾添加根標簽，以便可以使用我的Python代碼正確處理該文件。

我嘗試使用上一篇文章中的代碼，但出現錯誤“ XMLSyntaxError：文檔末尾，第__行，第1列的額外內容”

我該如何解決？ 還是有更好的方法將根標簽添加到大型XML文檔的開頭和結尾？

import lxml.etree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
newroot = ET.Element("root")
newroot.insert(0, root)
print(ET.tostring(newroot, pretty_print=True))

我的測試XML

<pub>
    <ID>75</ID>
    <title>Use of Lexicon Density in Evaluating Word Recognizers</title>
    <year>2000</year>
    <booktitle>Multiple Classifier Systems</booktitle>
    <pages>310-319</pages>
    <authors>
        <author>Petr Slav&iacute;k</author>
        <author>Venu Govindaraju</author>
    </authors>
</pub>
<pub>
    <ID>120</ID>
    <title>Virtual endoscopy with force feedback - a new system for neurosurgical training</title>
    <year>2003</year>
    <booktitle>CARS</booktitle>
    <pages>782-787</pages>
    <authors>
        <author>Christos Trantakis</author>
        <author>Friedrich Bootz</author>
        <author>Gero Strau&szlig;</author>
        <author>Edgar Nowatius</author>
        <author>Dirk Lindner</author>
        <author>H&uuml;seyin Kem&acirc;l &Ccedil;akmak</author>
        <author>Heiko Maa&szlig;</author>
        <author>Uwe G. K&uuml;hnapfel</author>
        <author>J&uuml;rgen Meixensberger</author>
    </authors>
</pub>

Answer 1

我懷疑這種策略有效，因為在最高級別只有一個A元素。 幸運的是，即使有200萬行，也可以輕松添加所需的行。

在執行此操作時，我注意到lxml解析器似乎無法處理帶重音的字符。 我在其中添加了將它們英語化的代碼。

import re

def anglicise(matchobj): return matchobj.group(0)[1]

outputFilename = 'result.xml'

with open('test.xml') as inXML, open(outputFilename, 'w') as outXML:
    outXML.write('<root>\n')
    for line in inXML.readlines():
        outXML.write(re.sub('&[a-zA-Z]+;',anglicise,line))
    outXML.write('</root>\n')

from lxml import etree

tree = etree.parse(outputFilename)
years = tree.xpath('.//year')
print (years[0].text)

編輯：將anglicise替換為該版本，以避免替換& 。

def anglicise(matchobj): 
    if matchobj.group(0) == '&amp;':
        return matchobj.group(0)
    else:
        return matchobj.group(0)[1]

新增中 <root> 使用Python標記為XML文檔

問題描述

1 個解決方案

解決方案1
1 已采納 2017-04-24 21:32:46

新增中 <root> 使用Python標記為XML文檔

問題描述

1 個解決方案

解決方案1 1 已采納 2017-04-24 21:32:46

解決方案1
1 已采納 2017-04-24 21:32:46