简体   繁体   English

Python imaplib .search电子邮件主题中文出现错误

[英]Python imaplib .search email subject Chinese got error

I want to use imaplib to search particular emails, which subjects contain Chinese. 我想使用imaplib搜索包含中文的特定电子邮件。 I got the error like this: 我得到这样的错误:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

so i use .encode to encode to 'UTF-8', and I got nothing. 所以我用.encode编码为'UTF-8',却一无所获。 The print print out is 打印输出是

0
[]

The right answer should be 71, which I search on my inbox through my mail. 正确答案应该是71,我可以通过邮件在收件箱中进行搜索。 This is my code: 这是我的代码:

import imaplib,email
host = 'imap.263.net'
user = '***@***'
psw = '*****'
count = 0
con = imaplib.IMAP4(host,143)
con.login(user,psw)
con.select('INBOX',readonly =True)
eva = '日报'
# eva = eva.encode('utf-8') 
resp,liujf = con.search('UTF-8','SUBJECT','%s'%eva, 'Since','01-Feb-2018')
items = liujf[0].split()
print(len(items))
print(items)

I guess it should be unicode problem. 我猜应该是unicode问题。 How can I fix it? 我该如何解决?

You are passing in a raw Unicode string where you should be passing in the string as a sequence of UTF-8 bytes. 您传入的是原始Unicode字符串,应该以UTF-8字节序列的形式传入该字符串。 You've even labelled it as UTF-8! 您甚至已将其标记为UTF-8! This suggests you might want to read up on the difference. 这表明您可能想了解它们之间的区别。

Change 更改

'%s'%eva

to

eva.encode('utf-8')

For more background, maybe read https://www.unicode.org/faq/utf_bom.html#UTF8 and/or https://nedbatchelder.com/text/unipain.html 有关更多背景信息,请阅读https://www.unicode.org/faq/utf_bom.html#UTF8和/或https://nedbatchelder.com/text/unipain.html

The construct '%s'%string is just an ugly and unidiomatic way to say string but here it's actually an error: '%s'%string.encode('utf-8') produces a byte string but then interpolates it into a Unicode string which produces completely the wrong result. 构造'%s'%string只是说string一种丑陋且惯用的方式,但是这里实际上是一个错误: '%s'%string.encode('utf-8')生成一个字节字符串,然后将其插入到一个Unicode字符串会产生完全错误的结果。 Observe: 注意:

>>> eva = '日报'
>>> eva.encode('utf-8')              # correct
b'\xe6\x97\xa5\xe6\x8a\xa5'
>>> '%s'%eva.encode('utf-8')         # incorrect
"b'\\xe6\\x97\\xa5\\xe6\\x8a\\xa5'"
>>> b'%s'%eva.encode('utf-8')        # correct but terribly fugly
b'\xe6\x97\xa5\xe6\x8a\xa5'

Notice how '%s'%eva.encode('utf-8') takes the encoded byte string and converts it back into a Unicode representation. 注意'%s'%eva.encode('utf-8')获取编码的字节字符串并将其转换 Unicode表示形式。 The commented-out line shows that you tried eva = eva.encode('utf-8') but then apparently ended up with the wrong result because of the unnecessary % interpolation into a Unicode string. 注释掉的行显示您尝试了eva = eva.encode('utf-8')但由于不必要的%内插到Unicode字符串中,因此显然以错误的结果结束。

I think you should first decode and then encode the Chinese literals.If we interpret it as latin-1 encoded, then you decode it first and then encode it. 我认为您应该先对中文文字进行解码然后再进行编码,如果我们将其解释为latin-1编码,则应先对其进行解码然后再对其进行编码。 Ex- eva.decode('latin-1').encode('utf-8') Exeva.decode('latin-1')。encode('utf-8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM