简体   繁体   中英

toprettyxml() : write() argument must be str, not bytes

My program saves a bit of XML data to a file in a prettyfied format from an XML string. This does the trick:

from xml.dom.minidom import parseString
dom = parseString(strXML)
with open(file_name + ".xml", "w", encoding="utf8") as outfile:
    outfile.write(dom.toprettyxml())

However, I noticed that my XML header is missing an encoding parameter.

<?xml version="1.0" ?>

Since my data is susceptible of containing many Unicode characters, I must make sure UTF-8 is also specified in the XML encoding field.

Now, looking at the minidom documentation, I read that "an additional keyword argument encoding can be used to specify the encoding field of the XML header". So I try this:

from xml.dom.minidom import parseString
dom = parseString(strXML)
with open(file_name + ".xml", "w", encoding="utf8") as outfile:
    outfile.write(dom.toprettyxml(encoding="UTF-8"))

But then I get:

TypeError: write() argument must be str, not bytes

Why doesn't the first piece of code yield that error? And what am I doing wrong?

Thanks!

R.

from the documentation emphasis mine:

With no argument , the XML header does not specify an encoding, and the result is Unicode string if the default encoding cannot represent all characters in the document. Encoding this string in an encoding other than UTF-8 is likely incorrect, since UTF-8 is the default encoding of XML.

With an explicit encoding argument , the result is a byte string in the specified encoding. It is recommended that this argument is always specified. To avoid UnicodeError exceptions in case of unrepresentable text data, the encoding argument should be specified as “utf-8”.

So the write method outputs a different object type whether encoding is set or not (which is rather confusing if you ask me)

So you can fix by removing the encoding:

with open(file_name + ".xml", "w", encoding="utf8") as outfile:
    outfile.write(dom.toprettyxml())
    

or open your file in binary mode which then accepts byte strings to be written to

with open(file_name + ".xml", "wb") as outfile:
    outfile.write(dom.toprettyxml(encoding="utf8"))

You can solve the problem as follow:

with open(targetName, 'wb') as f:
    f.write(dom.toprettyxml(indent='\t', encoding='utf-8'))

I don't recommend using 'wb' mode, because this does not take line-ending conversion into consideration. Instead I use the following method to do this:

dom = minidom.parseString(utf_8_xml_text)

out_byte = dom.toprettyxml(encoding="utf-8")
out_text = out_byte.decode("utf-8")

with open(filename, "w", encoding="utf-8") as f:
    f.write(out_text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM