简体   繁体   中英

Getting rid of the encoding in lxml

I am trying to print a XML file using lxml and Python.

Here is the code:

>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)

Output:

<?xml version='1.0' encoding='ASCII'?>
<root>
  <child/>
</root>

As you can see, I have declared encoding = None , however it still shows encoding = 'ASCII' in the final output. Which I guess is expected. If I don't put in the encoding tag, it still shows ASCII.

Is there any way I can just get the XML version tag and not the encoding part? I want the output to be like this:

<?xml version='1.0'>

It shouldn't matter what lxml.etree outputs as long as it's valid XML. If you really want to, you can glue strings together:

'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')

It's unclear why you want to remove it, since ultimately XML needs to know what charset it's in to make sense of anything. The XML 1.0 spec includes a method of guessing charsets, and seems to encourage the use of encoding declarations:

In the absence of [external information], it is a fatal error ... for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

...

Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM