简体   繁体   English

如何使用 xml.etree.ElementTree 编写 XML 声明

[英]How to write XML declaration using xml.etree.ElementTree

I am generating an XML document in Python using an ElementTree , but the tostring function doesn't include an XML declaration when converting to plaintext.我正在使用ElementTree在 Python 中生成一个 XML 文档,但是tostring函数在转换为纯文本时不包含XML 声明

from xml.etree.ElementTree import Element, tostring

document = Element('outer')
node = SubElement(document, 'inner')
node.NewValue = 1
print tostring(document)  # Outputs "<outer><inner /></outer>"

I need my string to include the following XML declaration:我需要我的字符串包含以下 XML 声明:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

However, there does not seem to be any documented way of doing this.但是,似乎没有任何记录的方法可以做到这一点。

Is there a proper method for rendering the XML declaration in an ElementTree ?是否有在ElementTree呈现 XML 声明的正确方法?

I am surprised to find that there doesn't seem to be a way with ElementTree.tostring() .我很惊讶地发现ElementTree.tostring()似乎没有办法。 You can however use ElementTree.ElementTree.write() to write your XML document to a fake file:但是,您可以使用ElementTree.ElementTree.write()将您的 XML 文档写入一个假文件:

from io import BytesIO
from xml.etree import ElementTree as ET

document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
et = ET.ElementTree(document)

f = BytesIO()
et.write(f, encoding='utf-8', xml_declaration=True) 
print(f.getvalue())  # your XML file, encoded as UTF-8

See this question .看到这个问题 Even then, I don't think you can get your 'standalone' attribute without writing prepending it yourself.即便如此,我不认为您可以在不自己编写的情况下获得“独立”属性。

I would use lxml (see http://lxml.de/api.html ).我会使用 lxml (见http://lxml.de/api.html )。

Then you can:然后你可以:

from lxml import etree
document = etree.Element('outer')
node = etree.SubElement(document, 'inner')
print(etree.tostring(document, xml_declaration=True))

If you include the encoding='utf8' , you will get an XML header :如果您包含encoding='utf8' ,您将获得一个 XML 标头

xml.etree.ElementTree.tostring writes a XML encoding declaration with encoding='utf8' xml.etree.ElementTree.tostring 使用 encoding='utf8' 编写 XML 编码声明

Sample Python code (works with Python 2 and 3):示例 Python 代码(适用于 Python 2 和 3):

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(
    ElementTree.fromstring('<xml><test>123</test></xml>')
)
root = tree.getroot()

print('without:')
print(ElementTree.tostring(root, method='xml'))
print('')
print('with:')
print(ElementTree.tostring(root, encoding='utf8', method='xml'))

Python 2 output: Python 2 输出:

$ python2 example.py
without:
<xml><test>123</test></xml>

with:
<?xml version='1.0' encoding='utf8'?>
<xml><test>123</test></xml>

With Python 3 you will note the b prefix indicating byte literals are returned (just like with Python 2):在 Python 3 中,您会注意到b前缀表示返回了字节文字(就像 Python 2 一样):

$ python3 example.py
without:
b'<xml><test>123</test></xml>'

with:
b"<?xml version='1.0' encoding='utf8'?>\n<xml><test>123</test></xml>"

xml_declaration Argument xml_declaration 参数

Is there a proper method for rendering the XML declaration in an ElementTree?是否有在 ElementTree 中呈现 XML 声明的正确方法?

YES, and there is no need of using .tostring function.是的,不需要使用.tostring函数。 According to ElementTree Documentation , you should create an ElementTree object , create Element and SubElements, set the tree's root , and finally use xml_declaration argument in .write function, so the declaration line is included in output file.根据ElementTree Documentation ,您应该创建一个 ElementTree 对象,创建 Element 和 SubElements,设置树的根,最后在.write函数中使用xml_declaration参数,因此声明行包含在输出文件中。

You can do it this way:你可以这样做:

import xml.etree.ElementTree as ET

tree = ET.ElementTree("tree")

document = ET.Element("outer")
node1 = ET.SubElement(document, "inner")
node1.text = "text"

tree._setroot(document)
tree.write("./output.xml", encoding = "UTF-8", xml_declaration = True)  

And the output file is:输出文件是:

<?xml version='1.0' encoding='UTF-8'?>
<outer><inner>text</inner></outer>

I encounter this issue recently, after some digging of the code, I found the following code snippet is definition of function ElementTree.write我最近遇到这个问题,经过一些代码挖掘,我发现以下代码片段是函数ElementTree.write定义

def write(self, file, encoding="us-ascii"):
    assert self._root is not None
    if not hasattr(file, "write"):
        file = open(file, "wb")
    if not encoding:
        encoding = "us-ascii"
    elif encoding != "utf-8" and encoding != "us-ascii":
        file.write("<?xml version='1.0' encoding='%s'?>\n" % 
     encoding)
    self._write(file, self._root, encoding, {})

So the answer is, if you need write the XML header to your file, set the encoding argument other than utf-8 or us-ascii , eg UTF-8所以答案是,如果您需要将 XML 标头写入文件,请设置utf-8us-ascii以外的encoding参数,例如UTF-8

The minimal working example with ElementTree package usage:使用ElementTree包的最小工作示例:

import xml.etree.ElementTree as ET

document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
node.text = '1'
res = ET.tostring(document, encoding='utf8', method='xml').decode()
print(res)

the output is:输出是:

<?xml version='1.0' encoding='utf8'?>
<outer><inner>1</inner></outer>

Another pretty simple option is to concatenate the desired header to the string of xml like this:另一个非常简单的选择是将所需的标头连接到 xml 字符串,如下所示:

xml = (bytes('<?xml version="1.0" encoding="UTF-8"?>\n', encoding='utf-8') + ET.tostring(root))
xml = xml.decode('utf-8')
with open('invoice.xml', 'w+') as f:
    f.write(xml)

Easy简单

Sample for both Python 2 and 3 ( encoding parameter must be utf8 ): Python 2 和 3 的示例(编码参数必须是utf8 ):

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(ElementTree.fromstring('<xml><test>123</test></xml>'))
root = tree.getroot()
print(ElementTree.tostring(root, encoding='utf8', method='xml'))

From Python 3.8 there is xml_declaration parameter for that stuff:从 Python 3.8开始,这些东西有xml_declaration参数:

New in version 3.8: The xml_declaration and default_namespace parameters. 3.8 版新功能:xml_declaration 和 default_namespace 参数。

xml.etree.ElementTree.tostring(element, encoding="us-ascii", method="xml", *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. xml.etree.ElementTree.tostring(element, encoding="us-ascii", method="xml", *, xml_declaration=None, default_namespace=None, short_empty_elements=True) 生成 XML 元素的字符串表示,包括所有子元素. element is an Element instance. element 是一个 Element 实例。 encoding 1 is the output encoding (default is US-ASCII). encoding 1 是输出编码(默认为 US-ASCII)。 Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated).使用 encoding="unicode" 生成 Unicode 字符串(否则生成字节字符串)。 method is either "xml", "html" or "text" (default is "xml").方法是“xml”、“html”或“text”(默认为“xml”)。 xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). xml_declaration、default_namespace 和 short_empty_elements 的含义与 ElementTree.write() 中的含义相同。 Returns an (optionally) encoded string containing the XML data.返回包含 XML 数据的(可选)编码字符串。

Sample for Python 3.8 and higher: Python 3.8 及更高版本的示例:

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(ElementTree.fromstring('<xml><test>123</test></xml>'))
root = tree.getroot()
print(ElementTree.tostring(root, encoding='unicode', method='xml', xml_declaration=True))

I would use ET :我会使用ET

try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        # Python 2.5
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        try:
            # Python 2.5
            import xml.etree.ElementTree as etree
            print("running with ElementTree on Python 2.5+")
        except ImportError:
            try:
                # normal cElementTree install
                import cElementTree as etree
                print("running with cElementTree")
            except ImportError:
               try:
                   # normal ElementTree install
                   import elementtree.ElementTree as etree
                   print("running with ElementTree")
               except ImportError:
                   print("Failed to import ElementTree from any known place")

document = etree.Element('outer')
node = etree.SubElement(document, 'inner')
print(etree.tostring(document, encoding='UTF-8', xml_declaration=True))

This works if you just want to print.如果您只想打印,这有效。 Getting an error when I try to send it to a file...当我尝试将其发送到文件时出现错误...

import xml.dom.minidom as minidom
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement, Comment, tostring

def prettify(elem):
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

Including 'standalone' in the declaration在声明中包括“独立”

I didn't found any alternative for adding the standalone argument in the documentation so I adapted the ET.tosting function to take it as an argument.我没有找到在文档中添加standalone参数的任何替代方法,因此我调整了ET.tosting函数以将其作为参数。

from xml.etree import ElementTree as ET

# Sample
document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
et = ET.ElementTree(document)

 # Function that you need   
 def tostring(element, declaration, encoding=None, method=None,):
     class dummy:
         pass
     data = []
     data.append(declaration+"\n")
     file = dummy()
     file.write = data.append
     ET.ElementTree(element).write(file, encoding, method=method)
     return "".join(data)
# Working example
xdec = """<?xml version="1.0" encoding="UTF-8" standalone="no" ?>"""    
xml = tostring(document, encoding='utf-8', declaration=xdec)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM