简体   繁体   English

python2.7 - 从带有unicode的.txt文件中读取字典

[英]python2.7 - reading a dictionary from a .txt file riddled with unicode

I enrolled into a Chinese Studies course some time ago, and I thought it'd be a great exercise for me to write a flashcard program in python. 我不久前就读了一门汉语研究课程,我觉得用python写一个flashcard程序对我来说是一个很好的练习。 I'm storing the flash card lists in a dictionary in a .txt file, so far without trouble. 我将闪存卡列表存储在.txt文件中的字典中,到目前为止没有问题。 The real problems kick in when I try to load the file, encoded in utf-8, into my program. 当我尝试将以utf-8编码的文件加载到我的程序中时,真正的问题就出现了。 An excerpt of my code: 我的代码的摘录:

import codecs

f = codecs.open(('list.txt'),'r','utf-8')
quiz_list = eval(f.read())

quizy = str(quiz_list).encode('utf-8')

print quizy

Now, if for example list.txt consists of: 现在,如果例如list.txt包含:

{'character1':'男人'}

what is printed is actually 实际上印刷的是什么

{'character1': '\xe7\x94\xb7\xe7\x86\xb1'}

Obviously there are some serious encoding issues here, but I cannot for the life of me understand where these occur. 显然这里存在一些严重的编码问题,但我不能为我的生活理解这些问题发生在哪里 I am working with a terminal which supports utf-8, so not the standard cmd.exe: this is not the problem. 我正在使用支持utf-8的终端,所以不是标准的cmd.exe:这不是问题。 Reading a normal list.txt without the curly dict-bits returns the chinese characters without a problem, so my guess is I'm not handling the dictionary part correctly. 读取没有卷曲dict-bits的普通list.txt会返回中文字符没有问题,所以我的猜测是我没有正确处理字典部分。 Any thoughts would be greatly appreciated! 任何想法将不胜感激!

There's nothing wrong with your encoding... Look at this: 您的编码没有问题......看看这个:

>>> d = {1:'男人'}
>>> d[1]
'\xe7\x94\xb7\xe4\xba\xba'
>>> print d[1]
男人

One thing is to print a unicode string another one is printing its representation. 有一件事是打印一个unicode字符串,另一个正在打印它的表示。

str(quizy) calls repr(quizy['character1']) which produces an ASCII representation of the string value. str(quizy)调用repr(quizy['character1']) ,它产生字符串值的ASCII表示。 If you just print quizy['character1'] you'll see that the character codes are Unicode in the Python string. 如果你只print quizy['character1']你会发现Python字符串中的字符代码是Unicode。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM