简体   繁体   English

python unicode:写入文件时,以不同的格式写入

[英]python unicode: when written to file, writes in different format

I am using Python 3.4, to write a unicode string to a file. 我使用Python 3.4,将unicode字符串写入文件。 After the file is written, if I open and see, it is totally a different set of characters. 写入文件后,如果我打开并看到,则它完全是一组不同的字符。

CODE:- 码:-

# -*- coding: utf-8 -*-

with open('test.txt', 'w', encoding='utf-8') as f:
    name = 'أبيض'
    name.encode("utf-8")
    f.write(name)
    f.close()    

f = open('test.txt','r')
for line in f.readlines():
    print(line) 

OUTPUT:- 输出: -

أبيض

Thanks in advance 提前致谢

You need to specify the codec to use when reading as well: 您还需要指定在阅读时使用的编解码器:

f = open('test.txt','r', encoding='utf8')
for line in f.readlines():
    print(line) 

otherwise your system default is used; 否则使用您的系统默认值; see the open() function documentation : 请参阅open()函数文档

encoding is the name of the encoding used to decode or encode the file. encoding是用于解码或编码文件的编码的名称。 This should only be used in text mode. 这应该只在文本模式下使用。 The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. 默认编码取决于平台(无论locale.getpreferredencoding()返回),但可以使用Python支持的任何编码。

Judging by the output you got, your system is using Windows Codepage 1252 as the default: 根据您获得的输出判断,您的系统使用Windows代码页1252作为默认值:

>>> 'أبيض'.encode('utf8').decode('cp1252')
'أبيض'

By using the wrong codec when reading, you created what is called a Mojibake . 通过在阅读时使用错误的编解码器,您创建了所谓的Mojibake

Note that the name.encode('utf8') line in your writing example is entirely redundant; 请注意,写作示例中的name.encode('utf8')行完全是多余的; the return value of that call is ignored, and it is the f.write(name) call that takes care of the actual encoding. 忽略该调用的返回值,并且f.write(name)调用负责实际编码。 The f.close() call is also entirely redundant, since the with statement already takes care of closing your file. f.close()调用也是完全冗余的,因为with语句已经负责关闭文件。 The following would produce the correct output: 以下将产生正确的输出:

with open('test.txt', 'w', encoding='utf-8') as f:
    name = 'أبيض'
    f.write(name)

with open('test.txt', 'r', encoding='utf-8') as f:
    for line in f.readlines():
        print(line) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM