[英]Python imaplib .search email subject Chinese got error
I want to use imaplib to search particular emails, which subjects contain Chinese. 我想使用imaplib搜索包含中文的特定电子邮件。 I got the error like this: 我得到这样的错误:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
so i use .encode to encode to 'UTF-8', and I got nothing. 所以我用.encode编码为'UTF-8',却一无所获。 The print print out is 打印输出是
0
[]
The right answer should be 71, which I search on my inbox through my mail. 正确答案应该是71,我可以通过邮件在收件箱中进行搜索。 This is my code: 这是我的代码:
import imaplib,email
host = 'imap.263.net'
user = '***@***'
psw = '*****'
count = 0
con = imaplib.IMAP4(host,143)
con.login(user,psw)
con.select('INBOX',readonly =True)
eva = '日报'
# eva = eva.encode('utf-8')
resp,liujf = con.search('UTF-8','SUBJECT','%s'%eva, 'Since','01-Feb-2018')
items = liujf[0].split()
print(len(items))
print(items)
I guess it should be unicode problem. 我猜应该是unicode问题。 How can I fix it? 我该如何解决?
You are passing in a raw Unicode string where you should be passing in the string as a sequence of UTF-8 bytes. 您传入的是原始Unicode字符串,应该以UTF-8字节序列的形式传入该字符串。 You've even labelled it as UTF-8! 您甚至已将其标记为UTF-8! This suggests you might want to read up on the difference. 这表明您可能想了解它们之间的区别。
Change 更改
'%s'%eva
to 至
eva.encode('utf-8')
For more background, maybe read https://www.unicode.org/faq/utf_bom.html#UTF8 and/or https://nedbatchelder.com/text/unipain.html 有关更多背景信息,请阅读https://www.unicode.org/faq/utf_bom.html#UTF8和/或https://nedbatchelder.com/text/unipain.html
The construct '%s'%string
is just an ugly and unidiomatic way to say string
but here it's actually an error: '%s'%string.encode('utf-8')
produces a byte string but then interpolates it into a Unicode string which produces completely the wrong result. 构造'%s'%string
只是说string
一种丑陋且惯用的方式,但是这里实际上是一个错误: '%s'%string.encode('utf-8')
生成一个字节字符串,然后将其插入到一个Unicode字符串会产生完全错误的结果。 Observe: 注意:
>>> eva = '日报'
>>> eva.encode('utf-8') # correct
b'\xe6\x97\xa5\xe6\x8a\xa5'
>>> '%s'%eva.encode('utf-8') # incorrect
"b'\\xe6\\x97\\xa5\\xe6\\x8a\\xa5'"
>>> b'%s'%eva.encode('utf-8') # correct but terribly fugly
b'\xe6\x97\xa5\xe6\x8a\xa5'
Notice how '%s'%eva.encode('utf-8')
takes the encoded byte string and converts it back into a Unicode representation. 注意'%s'%eva.encode('utf-8')
获取编码的字节字符串并将其转换回 Unicode表示形式。 The commented-out line shows that you tried eva = eva.encode('utf-8')
but then apparently ended up with the wrong result because of the unnecessary %
interpolation into a Unicode string. 注释掉的行显示您尝试了eva = eva.encode('utf-8')
但由于不必要的%
内插到Unicode字符串中,因此显然以错误的结果结束。
I think you should first decode and then encode the Chinese literals.If we interpret it as latin-1 encoded, then you decode it first and then encode it. 我认为您应该先对中文文字进行解码然后再进行编码,如果我们将其解释为latin-1编码,则应先对其进行解码然后再对其进行编码。 Ex- eva.decode('latin-1').encode('utf-8') Exeva.decode('latin-1')。encode('utf-8')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.