简体   繁体   English

为什么Python不能从Latin Extended-A写入字符(写入文件时会出现UnicodeEncodeError)?

[英]Why isn't Python writing characters from Latin Extended-A (UnicodeEncodeError when writing to a file)?

Obligatory intro noting that I've done some research 强制性介绍,我已经做过一些研究

This seems like it should be straightforward (I am happy to close as a duplicate if a suitable target question is found), but I'm not familiar enough with character encodings and how Python handles them to suss it out myself. 这似乎应该很简单(如果找到合适的目标问题,我很乐意将其作为重复内容关闭),但是我对字符编码以及Python如何处理它们自己感到不熟悉。 At risk of seeming lazy, I will note the answer very well may be in one of the links below, but I haven't yet seen it in my reading. 有可能看起来很懒惰,我会很好地指出答案可能在下面的链接之一中,但我尚未在阅读中看到它。

I've referenced some of the docs: Unicode HOWTO , codecs.py docs 我引用了一些文档: Unicode HOWTOcodecs.py文档

I've also looked at some old highly-voted SO questions: Writing Unicode text to a text file? 我还研究了一些古老的SO问题: 将Unicode文本写入文本文件? , Python, Unicode, and the Windows console Python,Unicode和Windows控制台


Question

Here's a MCVE code example that demonstrates my problem: 这是一个演示我的问题的MCVE代码示例:

with open('foo.txt', 'wt') as outfile:
    outfile.write('\u014d')

The traceback is as follows: 追溯如下:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\cashamerica\AppData\Local\Programs\Python\Python3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u014d' in position 0: character maps to <undefined>

I'm confused because the code point U+014D is 'ō', an assigned code point, LATIN SMALL LETTER O WITH MACRON ( official Unicode source ) 我很困惑,因为代码点U+014D是'ō',即分配的代码点, LATIN SMALL LETTER O WITH MACRON官方Unicode来源

I can even print the the character to the Windows console (but it renders as a normal 'o'): 我什至可以将字符打印到Windows控制台上(但它会显示为普通的“ o”):

>>> print('\u014d')
o

You are using cp1252 as the default encoding, which does not include ō . 您正在使用cp1252作为默认编码,其中不包含ō

Write (and read) your file with explicit encoding: 使用显式编码写入(和读取)文件:

with open('foo.txt', 'wt', encoding='utf8') as outfile:
    outfile.write('\u014d')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将亚洲字符写入文件时出现 UnicodeEncodeError - UnicodeEncodeError when writing asian characters to a file 写入文件时出现 UnicodeEncodeError - UnicodeEncodeError when writing to file 写入文件时出现 UnicodeEncodeError - UnicodeEncodeError when writing to a file UnicodeEncodeError: 'charmap' 编解码器无法编码字符/写入 txt 文件 - UnicodeEncodeError: 'charmap' codec can't encode characters/ writing in txt file Python:文件写入错误(UnicodeEncodeError) - Python: File Writing Error (UnicodeEncodeError) 将结果从 python 写入 csv 文件 [UnicodeEncodeError: &#39;charmap&#39; codec can&#39;t encode character - Writing out results from python to csv file [UnicodeEncodeError: 'charmap' codec can't encode character python文件写入显示UnicodeEncodeError的错误 - python file writing shows the error for UnicodeEncodeError 从Python的多个进程写入文件时,为什么我的文件没有损坏? - Why isn't my file corrupted while writing to it from multiple processes in Python? 写入csv文件时,writerow因UnicodeEncodeError失败 - when writing to csv file writerow fails with UnicodeEncodeError 从JSON写入CSV时出现UnicodeEncodeError - UnicodeEncodeError when writing to CSV from JSON
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM