简体繁体 English

使用 imap_tools（或 imaplib），如何通过重复获取整个 imap 数据库来同步 imap 更改而不是轮询？

[英]With imap_tools (or imaplib), how do I sync imap changes instead of polling by repeatedly fetching the entire imap database?

原文 2021-02-03 10:13:28 3 2 python/ imaplib

Since there are several similar sounding questions around I want to be very precise.由于周围有几个类似的问题，我想非常准确。

Edit: Let's focus on specifically on reacting dynamically to any email message being moved from one folder to another.编辑：让我们专注于对从一个文件夹移动到另一个文件夹的任何 email 消息进行动态反应。

A typical imap client app fetches only changes in the imap database since last sync.典型的 imap 客户端应用程序仅获取自上次同步以来 imap 数据库中的更改。 If your email client had to fetch every email each time you run it, that would take a long time.如果您的 email 客户端每次运行时都必须获取每个 email，那将需要很长时间。

Unfortunately my imap_tools app has to fetch (headers only) the entire imap database every time I run it.不幸的是，我的 imap_tools 应用程序每次运行时都必须获取（仅标题）整个 imap 数据库。 In order to detect changes dynamically, I would have to poll the entire set of messages repeatedly.为了动态检测变化，我必须反复轮询整组消息。 Obviously, this is not a reasonable design.显然，这不是一个合理的设计。

Does imap_tools (or the underlying imaplib) provide a mechanism for syncing? imap_tools（或底层 imaplib）是否提供同步机制？

Using the "seen" flag is not it.不是使用“seen”标志。 That is for indicating whether a human has read the message, and also is not specific to the specific client.那是用于指示人是否已经阅读了消息，并且也不是特定于特定客户端的。

Relying on uid is not quite it because I want to detect if the user has deleted or moved a message from one folder to another.依靠 uid 并不完全是因为我想检测用户是否已删除或将邮件从一个文件夹移动到另一个文件夹。

2 个解决方案

You can:你可以：

Use search args for limit data set: date_gte, date_lt, new...对限制数据集使用搜索参数：date_gte、date_lt、new...
Rely on message-id from headers if you store something如果您存储某些内容，请依赖标头中的 message-id
Use mailbox.move for reliable "mark" msg instead flags使用mailbox.move作为可靠的“标记”消息而不是标志
Calculate msg hash计算味精 hash

All depends on you task.一切都取决于你的任务。

As I know, there is no "sync" in IMAP, there is IDLE, but imap_tools can not do it.据我所知，IMAP 中没有“同步”，有 IDLE，但 imap_tools 做不到。

IMAP, at it's core, is an old and not terribly efficient protocol, as the design was not focused on syncing. IMAP 的核心是一种旧的且效率不高的协议，因为设计并不专注于同步。 Kundrát calls it a Cache Filing Protocol : the server is the one source of truth, and it is the client's job to display this to the user, and usually to cache as much of this as possible. Kundrát称其为Cache Filing Protocol ：服务器是事实的唯一来源，客户端的工作是向用户显示此内容，并且通常尽可能多地缓存此内容。

In Baseline IMAP, this generally means connecting to the server, and interrogating and caching as much information as the client cares to show.在基线 IMAP 中，这通常意味着连接到服务器，并询问和缓存客户端希望显示的尽可能多的信息。 Number of messages, headers, flags, possibly bodies, maybe attachments.消息、标题、标志、可能是正文、可能是附件的数量。

It also assumes the client has a mostly stable network connection while it is in use, which was true of most desktop mode clients.它还假设客户端在使用时具有大部分稳定的网络连接，大多数桌面模式客户端都是如此。 Once you have all your data synced, the server can send you unsolicited responses : EXISTS when a new message comes in;同步所有数据后，服务器可以主动向您unsolicited responses ：当有EXISTS进入时存在； STORE when flags are updated, EXPUNGE when a message is deleted.更新EXPUNGE时STORE ，删除消息时删除。 A server will not normally send these except in response to a permitted user command.除非响应允许的用户命令，否则服务器通常不会发送这些。 Older clients often used NOOP , or perhaps CHECK for this.老客户经常使用NOOP ，或者可能为此使用CHECK 。

If you lose your connection, clients will reconnect and refresh their cache.如果您失去连接，客户端将重新连接并刷新其缓存。 Since the only mutable things about messages is their existence and flags, this is usually fairly quick: the client will usually request all the flags for all messages.由于关于消息的唯一可变因素是它们的存在和标志，这通常相当快：客户端通常会为所有消息请求所有标志。 From there it can quickly update its cache.从那里它可以快速更新其缓存。 Apply flags.应用标志。 Fetch headers for new UIDs it discovered, remove the cached version of UIDs it didn't receive.获取它发现的新 UID 的标头，删除它没有收到的 UID 的缓存版本。

This does start to break down when a folder has many tens of thousands of messages, and you will find clients starts to have very slow startup/syncing speeds on some servers at this point, and start to use rather a lot of data.当一个文件夹有数万条消息时，这确实开始崩溃，您会发现客户端此时在某些服务器上的启动/同步速度开始非常慢，并且开始使用相当多的数据。

IMAP as a protocol cannot track messages across folders. IMAP 作为一种协议不能跨文件夹跟踪邮件。 The state per folder is completely separate.每个文件夹的 state 是完全独立的。 If it is moved, it is equivalent to a removal from one folder and an add to another.如果它被移动，则相当于从一个文件夹中删除并添加到另一个文件夹。 Desktop clients often maintain a pool of connections to watch more than a folder at a time.桌面客户端通常会维护一个连接池，以便一次查看多个文件夹。 You could apply heuristics to your cached messages to try to detect folder moves (eg, a selection of headers and metadata) but it can't be perfect.您可以将启发式方法应用于缓存消息以尝试检测文件夹移动（例如，标题和元数据的选择），但它并不完美。

As you can see, a lot of this is terribly inefficient once your mailbox grows past a few hundred messages, so there's a lot of extensions to make caching more efficient.如您所见，一旦您的邮箱超过几百条消息，其中的大部分效率都非常低，因此有很多扩展可以提高缓存效率。

UIDPLUS (RFC4315) is almost everywhere. UIDPLUS (RFC4315) 几乎无处不在。 This requires the server to support UIDs in more commands, and is almost required for any cache-mode client, as message sequence numbers are unreliable when deletions are involved.这要求服务器在更多命令中支持 UID，并且几乎是任何缓存模式客户端都需要的，因为在涉及删除时消息序列号是不可靠的。

IDLE (RFC 2177) is fairly common, but not everywhere. IDLE (RFC 2177) 相当普遍，但并非无处不在。 The client can issue an IDLE command, and this tells the server it's ready for those unsolicited updates at any time.客户端可以发出一个IDLE命令，这会告诉服务器它随时准备好接受那些未经请求的更新。 This means the client doesn't have to poll every few minutes with the NOOP command.这意味着客户端不必每隔几分钟使用NOOP命令轮询一次。

CONDSTORE (RFC 4551) is on most unix-type servers, and some commercial servers. CONDSTORE (RFC 4551) 在大多数 unix 类型的服务器和一些商业服务器上。 It, among other things, associates a serial number with flag changes.除其他外，它将序列号与标志更改相关联。 This allows the flag resync step to only get the changes from the most recent serial number it knows about.这允许标志重新同步步骤仅从它知道的最新序列号中获取更改。 It however does not help with detection with deleted messages, and a UID SEARCH ALL would still be necessary to find those after disconnection.但是，它无助于检测已删除的消息，并且仍然需要UID SEARCH ALL才能在断开连接后找到这些消息。

QRESYNC (RFC5162) provides resynchronization data for deleted messages. QRESYNC (RFC5162) 为已删除消息提供重新同步数据。 This unfortunately is a quite rare extension, and is almost nonexistent on large commercial servers.不幸的是，这是一个非常罕见的扩展，在大型商业服务器上几乎不存在。

NOTIFY (RFC5465) is almost nowhere. NOTIFY (RFC5465) 几乎无处可去。 It's supposed to be like a super-IDLE that can monitor multiple mailboxes at the same time.它应该像一个超级IDLE，可以同时监控多个邮箱。

Gmail Extensions is of course Gmail specific. Gmail Extensions当然是 Gmail 特定的。 It, among other things, associates a permanent identifier with each message (X-GM-MSGID), which DOES allow it to be reliably tracked across folders.除其他外，它将永久标识符与每条消息（X-GM-MSGID）相关联，这确实允许跨文件夹可靠地跟踪它。 It also provides the "ALL MAIL" folder and Labels, which means you could sync the whole account by just syncing the All Mail folder.它还提供“所有邮件”文件夹和标签，这意味着您可以通过同步所有邮件文件夹来同步整个帐户。 Like other servers, this does start to get bandwidth inefficient when hitting tens of thousands of messages.与其他服务器一样，当收到数万条消息时，这确实开始导致带宽效率低下。

From my experience of participating in the development of several mobile email clients which emphasized bandwidth efficiency and responsiveness, a client can appear very responsive even while dealing with all the problems of IMAP.根据我参与开发几个强调带宽效率和响应能力的移动 email 客户端的经验，即使在处理 IMAP 的所有问题时，客户端也会显得非常响应。 IDLE can be used to try to keep the INBOX in sync. IDLE可用于尝试保持收件箱同步。 If you can't do that, you can hide a lot of jank by only keeping the most recent week's messages in total sync, and sync the rest less frequently (UID SEARCH SINCE is helpful here).如果你不能这样做，你可以通过只保持最近一周的消息完全同步来隐藏很多卡顿，并减少同步 rest 的频率（UID SEARCH SINCE 在这里很有帮助）。 The user is usually only looking at the end of their inbox, and generally only cares about new messages coming in.用户通常只查看收件箱的末尾，并且通常只关心收到的新消息。

And in general, mirroring the move of a message was actually just detected as a Delete and an Add, it's just internet connections and servers are super fast and something that takes a couple hundred ms might look instant to a user.一般来说，镜像消息的移动实际上只是被检测为删除和添加，它只是互联网连接和服务器非常快，而且需要几百毫秒的东西对用户来说可能是即时的。 If any optimization is occurring, it's heuristic.如果发生任何优化，它是启发式的。 I think Thunderbird can have a protocol log you can turn on.我认为 Thunderbird 可以有一个可以打开的协议日志。 If you're really curious what it's doing, turn it on and move a message and see what it does.如果你真的很好奇它在做什么，打开它并移动一条消息，看看它做了什么。