简体   繁体   English

写入文件时出现 UnicodeEncodeError

[英]UnicodeEncodeError when writing to a file

I am trying to write some strings to a file (the strings have been given to me by the HTML parser BeautifulSoup).我正在尝试将一些字符串写入文件(这些字符串已由 HTML 解析器 BeautifulSoup 提供给我)。

I can use "print" to display them, but when I use file.write() I get the following error:我可以使用“打印”来显示它们,但是当我使用 file.write() 时出现以下错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

How can I parse this?我该如何解析?

If I type 'python unicode' into Google, I get about 14 million results;如果我在 Google 中输入“python unicode”,我会得到大约 1400 万个结果; the first is the official doc which describes the whole situation in excruciating detail;第一个是官方文档,详细描述了整个情况; and the fourth is a more practical overview that will pretty much spoon-feed you an answer, and also make sure you understand what's going on.第四个是更实用的概述,它几乎可以为您提供答案,并确保您了解正在发生的事情。

You really do need to read and understand these sorts of overviews, however long they seem.您确实需要阅读和理解这些类型的概述,无论它们看起来有多长。 There really isn't any getting around it.真的没有办法绕过它。 Text is hard.文字很难。 There is no such thing as "plain text", there hasn't been a reasonable facsimile for years, and there never really was, although we spent decades pretending there was.没有“纯文本”这样的东西,多年来没有合理的传真,而且从来没有真正存在过,尽管我们花了几十年假装有。 But Unicode is at least a standard.但 Unicode 至少是一个标准。

You also should read http://www.joelonsoftware.com/articles/Unicode.html .您还应该阅读http://www.joelonsoftware.com/articles/Unicode.html

This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring.当您将包含非英语字符(超过 128 的 Unicode 字符)的 Unicode 字符串传递给需要 ASCII 字节字符串的内容时,会发生此错误。 The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". Python 字节串的默认编码是 ASCII,“它正好处理 128 个(英文)字符”。 This is why trying to convert Unicode characters beyond 128 produces the error.这就是尝试转换 Unicode 字符超过 128 会产生错误的原因。

The unicode() unicode()

unicode(string[, encoding, errors])

constructor has the signature unicode(string[, encoding, errors]).构造函数具有签名 unicode(string[, encoding, errors])。 All of its arguments should be 8-bit strings.它的所有参数都应该是 8 位字符串。

The first argument is converted to Unicode using the specified encoding;第一个参数使用指定的编码转换为 Unicode; if you leave off the encoding argument, the ASCII encoding is used for the conversion , so characters greater than 127 will be treated as errors如果不使用 encoding 参数,则将使用 ASCII 编码进行转换,因此大于 127 的字符将被视为错误

for example例如

s = u'La Pe\xf1a' 
print s.encode('latin-1')

or要么

write(s.encode('latin-1'))

will encode using latin-1将使用 latin-1 编码

The answer to your question is "use codecs".您的问题的答案是“使用编解码器”。 The appeded code also shows some gettext magic, FWIW.附加的代码还显示了一些 gettext 魔法,FWIW。 http://wiki.wxpython.org/Internationalization http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

Despite Google being full of hits on this problem, I found it rather hard to find this simple solution (it is actually in the Python docs about Unicode, but rather burried).尽管谷歌在这个问题上充满了成功,但我发现很难找到这个简单的解决方案(它实际上在关于 Unicode 的 Python 文档中,但相当隐蔽)。

So ... HTH...所以……哈……

GaJ伽马

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM