简体   繁体   English

异步 imap 获取邮件 python3

[英]Asyncio imap fetch mails python3

I'm testing with the asyncio module, however I need a hint / suggesstion how to fetch large emails in an async way.我正在使用 asyncio 模块进行测试,但是我需要提示/建议如何以异步方式获取大型电子邮件。

I have a list with usernames and passwords for the mail accounts.我有一个包含邮件帐户用户名和密码的列表。

data = [
    {'usern': 'foo@bar.de', 'passw': 'x'},
    {'usern': 'foo2@bar.de', 'passw': 'y'},
    {'usern': 'foo3@bar.de', 'passw': 'z'} (...)
]

I thought about:我想过:

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([get_attachment(d) for d in data]))
loop.close()

However, the long part is to download the email attachments.但是,较长的部分是下载电子邮件附件。

Email:电子邮件:

@asyncio.coroutine
def get_attachment(d):
    username = d['usern']
    password = d['passw']

    connection = imaplib.IMAP4_SSL('imap.bar.de')
    connection.login(username, password)
    connection.select()

    # list all available mails
    typ, data = connection.search(None, 'ALL')

    for num in data[0].split():
        # fetching each mail
        typ, data = connection.fetch(num, '(RFC822)')
        raw_string = data[0][1].decode('utf-8')
        msg = email.message_from_string(raw_string)
        for part in msg.walk():
            if part.get_content_maintype() == 'multipart':
                continue

            if part.get('Content-Disposition') is None:
                continue

            if part.get_filename():
                body = part.get_payload(decode=True)
                # do something with the body, async?

    connection.close()
    connection.logout()

How could I process all (downloading attachments) mails in an async way?如何以异步方式处理所有(下载附件)邮件?

If you don't have an asynchronous I/O-based imap library, you can just use a concurrent.futures.ThreadPoolExecutor to do the I/O in threads.如果您没有基于异步 I/O 的 imap 库,您可以使用concurrent.futures.ThreadPoolExecutor在线程中执行 I/O。 Python will release the GIL during the I/O, so you'll get true concurrency: Python 将在 I/O 期间释放 GIL,因此您将获得真正的并发性:

def init_connection(d):    
    username = d['usern']
    password = d['passw']

    connection = imaplib.IMAP4_SSL('imap.bar.de')
    connection.login(username, password)
    connection.select()
    return connection

local = threading.local() # We use this to get a different connection per thread
def do_fetch(num, d, rfc):
    try:
        connection = local.connection
    except AttributeError:
        connnection = local.connection = init_connection(d)
    return connnection.fetch(num, rfc)

@asyncio.coroutine
def get_attachment(d, pool):
    connection = init_connection(d)    
    # list all available mails
    typ, data = connection.search(None, 'ALL')

    # Kick off asynchronous tasks for all the fetches
    loop = asyncio.get_event_loop()
    futs = [asyncio.create_task(loop.run_in_executor(pool, do_fetch, num, d, '(RFC822)'))
                for num in data[0].split()]

    # Process each fetch as it completes
    for fut in asyncio.as_completed(futs):
        typ, data = yield from fut
        raw_string = data[0][1].decode('utf-8')
        msg = email.message_from_string(raw_string)
        for part in msg.walk():
            if part.get_content_maintype() == 'multipart':
                continue

            if part.get('Content-Disposition') is None:
                continue

            if part.get_filename():
                body = part.get_payload(decode=True)
                # do something with the body, async?

    connection.close()
    connection.logout()    


loop = asyncio.get_event_loop()
pool = ThreadPoolExecutor(max_workers=5)  # You can probably increase max_workers, because the threads are almost exclusively doing I/O.
loop.run_until_complete(asyncio.wait([get_attachment(d, pool) for d in data]))
loop.close()

This isn't quite as nice as a truly asynchronous I/O-based solution, because you've still got the overhead of creating the threads, which limits scalability and adds extra memory overhead.这不如真正的基于异步 I/O 的解决方案好,因为您仍然有创建线程的开销,这限制了可扩展性并增加了额外的内存开销。 You also do get some GIL slowdown because of all the code wrapping the actual I/O calls.由于所有代码都包装了实际的 I/O 调用,因此您也确实会遇到一些GIL 减慢。 Still, if you're dealing with less than thousands of mails, it should still perform ok.尽管如此,如果您处理的邮件少于数千封,它应该仍然可以正常运行。

We use run_in_executor to use the ThreadPoolExecutor as part of the asyncio event loop, asyncio.async to wrap the coroutine object returned in a asyncio.Future , and as_completed to iterate through the futures in the order they complete.我们使用run_in_executor使用ThreadPoolExecutor作为ASYNCIO事件循环的一部分, asyncio.async包裹在返回的协程对象asyncio.Future ,并as_completed迭代通过的顺序是期货,就完成了。

Edit :编辑

It seems imaplib is not thread-safe.看来imaplib不是线程安全的。 I've edited my answer to use thread-local storage via threading.local , which allows us to create one connection object per-thread, which can be re-used for the entire life of the thread (meaning you create num_workers connection objects only, rather than a new connection for every fetch ).我已经编辑了我的答案以通过threading.local使用线程本地存储,这允许我们为每个线程创建一个连接对象,可以在线程的整个生命周期中重复使用(意味着您只创建num_workers连接对象,而不是每次fetch建立一个新连接)。

I had the same needs : fetching emails with python 3 fully async.我有同样的需求:使用 python 3 完全异步获取电子邮件。 If others here are interested I pushed an asyncio IMAP lib here : https://github.com/bamthomas/aioimaplib如果这里的其他人感兴趣,我在这里推送了一个 asyncio IMAP 库: https : //github.com/bamthomas/aioimaplib

You can use it like this :你可以这样使用它:

import asyncio
from aioimaplib import aioimaplib

@asyncio.coroutine
def wait_for_new_message(host, user, password):
    imap_client = aioimaplib.IMAP4(host=host)
    yield from imap_client.wait_hello_from_server()

    yield from imap_client.login(user, password)
    yield from imap_client.select()

    asyncio.async(imap_client.idle())
    id = 0
    while True:
        msg = yield from imap_client.wait_server_push()
        print('--> received from server: %s' % msg)
        if 'EXISTS' in msg:
            id = msg.split()[0]
            imap_client.idle_done()
            break

    result, data = yield from imap_client.fetch(id, '(RFC822)')
    email_message = email.message_from_bytes(data[0])

    attachments = []
    body = ''
    for part in email_message.walk():
        if part.get_content_maintype() == 'multipart':
            continue
        if part.get_content_maintype() == 'text' and 'attachment' not in part.get('Content-Disposition', ''):
            body = part.get_payload(decode=True).decode(part.get_param('charset', 'ascii')).strip()
        else:
            attachments.append(
                {'type': part.get_content_type(), 'filename': part.get_filename(), 'size': len(part.as_bytes())})

    print('attachments : %s' % attachments)
    print('body : %s' % body)
    yield from imap_client.logout()



if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(wait_for_new_message('my.imap.server', 'user', 'pass'))

Large emails with attachments are also downloaded with asyncio.带有附件的大型电子邮件也可以使用 asyncio 下载。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM