[英]Asyncio imap fetch mails python3
我正在使用 asyncio 模塊進行測試,但是我需要提示/建議如何以異步方式獲取大型電子郵件。
我有一個包含郵件帳戶用戶名和密碼的列表。
data = [
{'usern': 'foo@bar.de', 'passw': 'x'},
{'usern': 'foo2@bar.de', 'passw': 'y'},
{'usern': 'foo3@bar.de', 'passw': 'z'} (...)
]
我想過:
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([get_attachment(d) for d in data]))
loop.close()
但是,較長的部分是下載電子郵件附件。
電子郵件:
@asyncio.coroutine
def get_attachment(d):
username = d['usern']
password = d['passw']
connection = imaplib.IMAP4_SSL('imap.bar.de')
connection.login(username, password)
connection.select()
# list all available mails
typ, data = connection.search(None, 'ALL')
for num in data[0].split():
# fetching each mail
typ, data = connection.fetch(num, '(RFC822)')
raw_string = data[0][1].decode('utf-8')
msg = email.message_from_string(raw_string)
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
if part.get_filename():
body = part.get_payload(decode=True)
# do something with the body, async?
connection.close()
connection.logout()
如何以異步方式處理所有(下載附件)郵件?
如果您沒有基於異步 I/O 的 imap 庫,您可以使用concurrent.futures.ThreadPoolExecutor
在線程中執行 I/O。 Python 將在 I/O 期間釋放 GIL,因此您將獲得真正的並發性:
def init_connection(d):
username = d['usern']
password = d['passw']
connection = imaplib.IMAP4_SSL('imap.bar.de')
connection.login(username, password)
connection.select()
return connection
local = threading.local() # We use this to get a different connection per thread
def do_fetch(num, d, rfc):
try:
connection = local.connection
except AttributeError:
connnection = local.connection = init_connection(d)
return connnection.fetch(num, rfc)
@asyncio.coroutine
def get_attachment(d, pool):
connection = init_connection(d)
# list all available mails
typ, data = connection.search(None, 'ALL')
# Kick off asynchronous tasks for all the fetches
loop = asyncio.get_event_loop()
futs = [asyncio.create_task(loop.run_in_executor(pool, do_fetch, num, d, '(RFC822)'))
for num in data[0].split()]
# Process each fetch as it completes
for fut in asyncio.as_completed(futs):
typ, data = yield from fut
raw_string = data[0][1].decode('utf-8')
msg = email.message_from_string(raw_string)
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
if part.get_filename():
body = part.get_payload(decode=True)
# do something with the body, async?
connection.close()
connection.logout()
loop = asyncio.get_event_loop()
pool = ThreadPoolExecutor(max_workers=5) # You can probably increase max_workers, because the threads are almost exclusively doing I/O.
loop.run_until_complete(asyncio.wait([get_attachment(d, pool) for d in data]))
loop.close()
這不如真正的基於異步 I/O 的解決方案好,因為您仍然有創建線程的開銷,這限制了可擴展性並增加了額外的內存開銷。 由於所有代碼都包裝了實際的 I/O 調用,因此您也確實會遇到一些GIL 減慢。 盡管如此,如果您處理的郵件少於數千封,它應該仍然可以正常運行。
我們使用run_in_executor
使用ThreadPoolExecutor
作為ASYNCIO事件循環的一部分, asyncio.async
包裹在返回的協程對象asyncio.Future
,並as_completed
迭代通過的順序是期貨,就完成了。
編輯:
看來imaplib
不是線程安全的。 我已經編輯了我的答案以通過threading.local
使用線程本地存儲,這允許我們為每個線程創建一個連接對象,可以在線程的整個生命周期中重復使用(意味着您只創建num_workers
連接對象,而不是每次fetch
建立一個新連接)。
我有同樣的需求:使用 python 3 完全異步獲取電子郵件。 如果這里的其他人感興趣,我在這里推送了一個 asyncio IMAP 庫: https : //github.com/bamthomas/aioimaplib
你可以這樣使用它:
import asyncio
from aioimaplib import aioimaplib
@asyncio.coroutine
def wait_for_new_message(host, user, password):
imap_client = aioimaplib.IMAP4(host=host)
yield from imap_client.wait_hello_from_server()
yield from imap_client.login(user, password)
yield from imap_client.select()
asyncio.async(imap_client.idle())
id = 0
while True:
msg = yield from imap_client.wait_server_push()
print('--> received from server: %s' % msg)
if 'EXISTS' in msg:
id = msg.split()[0]
imap_client.idle_done()
break
result, data = yield from imap_client.fetch(id, '(RFC822)')
email_message = email.message_from_bytes(data[0])
attachments = []
body = ''
for part in email_message.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get_content_maintype() == 'text' and 'attachment' not in part.get('Content-Disposition', ''):
body = part.get_payload(decode=True).decode(part.get_param('charset', 'ascii')).strip()
else:
attachments.append(
{'type': part.get_content_type(), 'filename': part.get_filename(), 'size': len(part.as_bytes())})
print('attachments : %s' % attachments)
print('body : %s' % body)
yield from imap_client.logout()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(wait_for_new_message('my.imap.server', 'user', 'pass'))
帶有附件的大型電子郵件也可以使用 asyncio 下載。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.