简体   繁体   English

python imaplib阅读gmail

[英]python imaplib reading gmail

I am using imaplib to read gmail messages in my python command window. 我正在使用imaplib在python命令窗口中读取gmail消息。 The only problem is if that the emails come with with newlines and return carriages. 唯一的问题是,电子邮件是否带有换行符和回车符。 Also, the text does not seem to be formatted correct. 此外,该文本似乎格式不正确。 Instead of Amount: $36.49, it returns =2436.49. 而不是金额:$ 36.49,它返回= 2436.49。 How can I go about cleaning up this text? 我该如何清理此文本? Thanks! 谢谢!

Sample email content: 样本电子邮件内容:

r\nItem name: Scanner\r\nItem=23: 130585100869\r\nPurchase Date: Oct 7, 2011\r\nUnit Price: =2436.49 USD\r\nQty: 1\r\nAmount: =2436.49USD\r\nSubtotal: =2436.49 USD\r\nShipping and handling: =240.00 USD\r\nInsurance - not offered

Code: 码:

import imaplib
import libgmail
import re
import email
from BeautifulSoup import BeautifulSoup

USER = 'email@gmail.com'
PASSWORD = 'password'

#connecting to the gmail imap server
imap_server = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap_server.login(USER, PASSWORD)
imap_server.select('Inbox')

typ, response = imap_server.search(None, '(SUBJECT "payment received")')

Data = []

for i in response[0].split():
    results, data = imap_server.fetch(i, "(RFC822)")
    Data.append(data)
    break

for i in Data:
    print i

The data is in quoted-printable encoding, this is a little data massager that should get you what you want: 数据采用带引号的可打印编码,这是一个小型数据按摩器,可以为您提供所需的信息:

text = '''\r\nPurchase Date: Oct 7, 2011\r\nUnit Price: =2436.49 USD\r\nQty: 1\r\nAmount: =2436.49 USD\r\nSubtotal: =2436.49 USD\r\nShipping and handling: =240.00 USD\r\nInsurance - not offered : ----\r\n----------------------------------------------------------------------\r\nTax: --\r\nTotal: =2436.49 USD\r\nPayment: =2436.49 USD\r\nPayment sent to: emailaddress=40gmail.com\r\n----------------------------------------------------------------------\r\n\r\nSincerely,\r\nPayPal\r\n=20\r\n----------------------------------------------------------------------\r\nHelp Center:=20\r\nhttps://www.paypal.com/us/cgi-bin/helpweb?cmd=3D_help\r\nSecurity Center:=20\r\nhttps://www.paypal.com/us/security\r\n\r\nThis email was sent by an automated system, so if you reply, nobody will =\r\nsee it. To get in touch with us, log in to your account and click =\r\n=22Contact Us=22 at the bottom of any page.\r\n\r\n'''

raw_data = text.decode("quopri") #replace =XX for the real characters

data = [map(str.strip, l.split(":")) for l in raw_data.splitlines() if ": " in l]

print data
# [['Purchase Date', 'Oct 7, 2011'], ['Unit Price', '$36.49 USD'], ['Qty', '1'], ['Amount', '$36.49 USD'], ['Subtotal', '$36.49 USD'], ['Shipping and handling', '$0.00 USD'], ['Insurance - not offered', '----'], ['Tax', '--'], ['Total', '$36.49 USD'], ['Payment', '$36.49 USD'], ['Payment sent to', 'emailaddress@gmail.com'], ['Help Center', ''], ['Security Center', '']]

There you have your data in a much easier to process format, I hope it helps. 在那里,您可以更轻松地处理数据,希望对您有所帮助。

Edit: to make it even cuter: 编辑:使其更加可爱:

>>> cooked = dict(data)
>>> print cooked["Unit Price"]
$36.49 USD

The \\r\\n issue \\r\\n问题

The \\r\\n problem is caused by you not printing strings, but internal representations thereof. \\r\\n问题是由您不是打印字符串而是由其内部表示引起的。 Try this to understand what I mean: 尝试一下以了解我的意思:

print ['test\n']
print 'test\n'

The i that you print above is a list of strings, so first representation kicks in. Try this: 您在上面打印的i是一个字符串列表,因此第一个表示形式会出现。请尝试以下操作:

print(Data[0][0][1])

I identified this by inspection of the object -- you should read the documentation of the libraries you are using to understand what exactly this object is composed of to understand why specifically this field represents the message. 我通过检查对象确定了这一点-您应该阅读所使用的库的文档,以了解该对象的确切组成,以了解为什么该字段专门表示消息。 Or how to convert the Data object to something more... palatable. 或如何将Data对象转换为更...可口的东西。

The encoding issue 编码问题

Try: 尝试:

import quopri
print quopri.decodestring(Data[0][0][1])

If these are actually email messages, you can use the email module to get you started. 如果这些实际上是电子邮件,则可以使用email模块来开始使用。 You can use it to do the proper quoted-printable decoding and get some clean text. 您可以使用它来进行正确的带引号的可打印解码,并获得一些干净的文本。

After that, though, you will need to write your own code to extract the parts you want. 不过,在那之后,您将需要编写自己的代码以提取所需的部分。 This is not a standard format for which parsers would exist. 这不是解析器将存在的标准格式。 I would use regular expressions. 我会使用正则表达式。

Note that \\r\\n is most likely just the carriage-return character followed by a linefeed character, not "slash, r, slash, n". 请注意, \\r\\n 最有可能只是回车符和换行符,而不是“ slash,r,slash,n”。 In an interactive terminal Python will represent control and whitespace characters with their symbolic form. 在交互式终端中,Python将使用其符号形式来表示控件和空格字符。

Just use split and then check to see if the line matches what you're looking for. 只需使用split,然后检查该行是否与您要查找的内容匹配。

You can pretty it up a bit, but this is a fairly simply way to handle it. 您可以对其进行一些修饰,但这是一种非常简单的处理方法。

f = yourBlockOfText

text = f.split('\\r\\n')
for line in text:
    if line[0:4] == "Unit":
         print line
    elif line[0:17] == "Payment sent to: ":
        print line

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM