简体   繁体   English

Python:从raw_input()读取UTF-8并在文件中写入UTF-8

[英]Python: Reading UTF - 8 from raw_input() & Writing UTF - 8 in file

So, i would like to make a program does 2 things: 所以,我想做一个程序做两件事:

  1. Reads A Word 读一个字
  2. Reads the translation in Greek 阅读希腊文翻译

Then I make a new format that looks like this: "word,translation" and i'm writing it into a file. 然后,我制作一种新的格式,如下所示: "word,translation"然后将其写入文件中。

So the test.txt file should contain "Hello,Γεια" and in case i read again , the next line should go under this one. 因此,test.txt文件应包含"Hello,Γεια" ,以防万一我再次阅读,下一行应位于该行的下方。

word=raw_input("Word:\n")  #The Word
translation=raw_input("Translation:\n").decode("utf-8") #The Translation in UTF-8
format=word+","+translation+"\n"
file=open("dict.txt","w")
file.write(format.encode("utf-8"))
file.close()

The Error I get: 我得到的错误:

UnicodeDecodeError 'utf8'codec can't decode byte 0x82 in position 0: invalid start byte UnicodeDecodeError'utf8'编解码器无法解码位置0的字节0x82:无效的起始字节

EDIT : This is Python 22 编辑 :这是Python 22

Although python 2 supports unicode, its input is not automatically decoded into unicode for you. 尽管python 2支持unicode,但它的输入不会自动为您解码为unicode。 raw_input returns a string and if something other than ascii is piped in, you get the encoded bytes. raw_input返回一个字符串,如果通过管道输入了ascii以外的内容,则会得到编码后的字节。 The trick is to figure out what that encoding is. 诀窍是弄清楚编码是什么。 And that depends on whatever is pumping data into the program. 这取决于将数据泵入程序的方式。 if its a terminal, then sys.stdin.encoding should tell you what encoding to use. 如果是终端,则sys.stdin.encoding应该告诉您要使用的编码。 If its piped in from, say, a file, then sys.stdin.encoding is None and you just kinda have to know what it is. 如果它是从文件(例如,文件)中sys.stdin.encoding ,则sys.stdin.encoding为None(无),您只需要知道它是什么即可。

A solution to your problem follows. 解决您的问题的方法如下。 Note that even though your method of writing the file (encode then write) works, the codecs module imports a file object that does it for you. 请注意,即使您写入文件的方法(先编码然后写入)都可以, codecs模块也会导入一个文件对象来为您执行此操作。

import sys
import codecs

# just randomly picking an encoding.... a command line param may be
# useful if you want to get input from files
_stdin_encoding = sys.stdin.encoding or 'utf-8'

def unicode_input(prompt):
    return raw_input(prompt).decode(_stdin_encoding)

word=unicode_input("Word:\n")  #The Word
translation=unicode_input("Translation:\n")
format=word+","+translation+"\n"
with codecs.open("dict.txt","w") as myfile:
    myfile.write(format)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM