[英]Python: Reading UTF - 8 from raw_input() & Writing UTF - 8 in file
So, i would like to make a program does 2 things: 所以,我想做一个程序做两件事:
Then I make a new format that looks like this: "word,translation"
and i'm writing it into a file. 然后,我制作一种新的格式,如下所示: "word,translation"
然后将其写入文件中。
So the test.txt file should contain "Hello,Γεια"
and in case i read again , the next line should go under this one. 因此,test.txt文件应包含"Hello,Γεια"
,以防万一我再次阅读,下一行应位于该行的下方。
word=raw_input("Word:\n") #The Word
translation=raw_input("Translation:\n").decode("utf-8") #The Translation in UTF-8
format=word+","+translation+"\n"
file=open("dict.txt","w")
file.write(format.encode("utf-8"))
file.close()
The Error I get: 我得到的错误:
UnicodeDecodeError 'utf8'codec can't decode byte 0x82 in position 0: invalid start byte UnicodeDecodeError'utf8'编解码器无法解码位置0的字节0x82:无效的起始字节
EDIT : This is Python 22 编辑 :这是Python 22
Although python 2 supports unicode, its input is not automatically decoded into unicode for you. 尽管python 2支持unicode,但它的输入不会自动为您解码为unicode。 raw_input
returns a string and if something other than ascii is piped in, you get the encoded bytes. raw_input
返回一个字符串,如果通过管道输入了ascii以外的内容,则会得到编码后的字节。 The trick is to figure out what that encoding is. 诀窍是弄清楚编码是什么。 And that depends on whatever is pumping data into the program. 这取决于将数据泵入程序的方式。 if its a terminal, then sys.stdin.encoding
should tell you what encoding to use. 如果是终端,则sys.stdin.encoding
应该告诉您要使用的编码。 If its piped in from, say, a file, then sys.stdin.encoding
is None and you just kinda have to know what it is. 如果它是从文件(例如,文件)中sys.stdin.encoding
,则sys.stdin.encoding
为None(无),您只需要知道它是什么即可。
A solution to your problem follows. 解决您的问题的方法如下。 Note that even though your method of writing the file (encode then write) works, the codecs
module imports a file object that does it for you. 请注意,即使您写入文件的方法(先编码然后写入)都可以, codecs
模块也会导入一个文件对象来为您执行此操作。
import sys
import codecs
# just randomly picking an encoding.... a command line param may be
# useful if you want to get input from files
_stdin_encoding = sys.stdin.encoding or 'utf-8'
def unicode_input(prompt):
return raw_input(prompt).decode(_stdin_encoding)
word=unicode_input("Word:\n") #The Word
translation=unicode_input("Translation:\n")
format=word+","+translation+"\n"
with codecs.open("dict.txt","w") as myfile:
myfile.write(format)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.