解决文件中unicode输入字符串与unicode数据的比较

Question

string1=" म नेपाली  हुँ"
string1=string1.split()
string1[0]
'\xe0\xa4\xae'

with codecs.open('nepaliwords.txt','r','utf-8') as f:
     for line in f:
             if string1[0] in line:
                     print "matched string found in file"

追溯（最近一次通话最后一次）：文件“”，第3行，UnicodeDecodeError：“ ascii”编解码器无法解码位置0的字节0xe0：序数不在范围内（128）

在文本文件中，我有大量的尼泊尔unicode。

在比较两个unicode字符串时，我在做错什么吗？

如何打印匹配的unicode字符串？

Answer 1

您的string1是一个字节字符串 ，编码为UTF-8。 它不是 Unicode字符串。 但是您使用了codecs.open()来让Python将文件内容解码为unicode 。 然后尝试将您的字节字符串与包含测试一起使用，会导致Python将字节字符串隐式解码为unicode以匹配类型。 由于隐式解码使用ASCII，因此失败。

首先将string1解码为unicode ：

string1 = " म नेपाली  हुँ"
string1 = string1.decode('utf8').split()[0]

或改用Unicode字符串文字：

string1 = u" म नेपाली  हुँ"
string1 = string1.split()[0]

在开始时注意u 。

解决文件中unicode输入字符串与unicode数据的比较

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-08-15 13:10:49

解决文件中unicode输入字符串与unicode数据的比较

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-08-15 13:10:49

解决方案1
3 已采纳 2015-08-15 13:10:49