简体   繁体   English

撤消使用imaplib提取的电子邮件的“标记为已读”状态

[英]Undoing “marked as read” status of emails fetched with imaplib

I wrote a python script to fetch all of my gmail. 我写了一个python脚本来获取我所有的gmail。 I have hundreds of thousands of old emails, of which about 10,000 were unread. 我有数十万封旧电子邮件,其中约10,000封未读。

After successfully fetching all of my email, I find that gmail has marked all the fetched emails as "read". 成功提取所有电子邮件后,我发现gmail已将所有提取的电子邮件标记为“已读”。 This is disastrous for me since I need to check all unread emails only. 这对我来说是灾难性的,因为我只需要检查所有未读的电子邮件。

How can I recover the information about which emails were unread? 如何恢复有关未读电子邮件的信息? I dumped each mail object into files, the core of my code is shown below: 我将每个邮件对象转储到文件中,代码的核心如下所示:

m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") 
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split() 
for uid in uids:
    resp, data = m.uid('fetch', uid, "(RFC822)") 
    email_body = data[0][1]
    mail = email.message_from_string(email_body)
    dumbobj(uid, mail)

I am hoping there is either an option to undo this in gmail, or a member inside the stored mail objects reflecting the seen-state information. 我希望可以选择使用gmail撤消此操作,或者希望所存储的邮件对象中的成员反映可见状态信息。

For anyone looking to prevent this headache, consider this answer here . 对于希望防止这种头痛的任何人,请在此处考虑此答案。 This does not work for me, however, since the damage has already been done. 但是,由于损坏已经完成,因此这对我不起作用。

Edit: I have written the following function to recursively "grep" all strings in an object, and applied it to a dumped email object using the following keywords: 编辑:我编写了以下函数来递归“ grep”对象中的所有字符串,并使用以下关键字将其应用于转储的电子邮件对象:

regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"

So far, no results (only an unrelated "Delivered-To"). 到目前为止,没有结果(只有不相关的“已交付给”)。 Which other keywords could I try? 我还可以尝试其他哪些关键字?

def grep_object (obj, regex , cycle = set(), matched = set()):
    import re
    if id(obj) in cycle:
        return 
    cycle.update([id(obj)])
    if isinstance(obj, basestring):
        if re.search(regex, obj):
            matched.update([obj])

    def grep_dict (adict ):
        try:
             [  [ grep_object(a, regex, cycle, matched )  for a in ab ] for ab in adict.iteritems() ]
        except:pass

    grep_dict(obj)
    try:grep_dict(obj.__dict__)
    except:pass
    try:
        [ grep_object(elm, regex, cycle, matched ) for elm in obj ]
    except: pass
    return matched

grep_object(mail_object, regex)

I'm having a similar problem (not with gmail), and the biggest problem for me was to make a reproducible test case; 我有一个类似的问题(不是gmail),对我来说最大的问题是制作一个可重现的测试用例。 and I finally managed to produce one (see below). 最后我设法生产了一个(见下文)。

In terms of the Seen flag, I now gather it goes like this: Seen标志而言,我现在将其收集为:

  • If a message is new/unseen, IMAP fetch for \\Seen flag will return empty (ie it will not be present, as related to the email message). 如果消息是新消息/看不见,则\\Seen标志的IMAP提取将返回空(即,与电子邮件消息相关的消息将不存在)。
  • If you do IMAP select on a mailbox (INBOX), you get a "flag" UNSEEN which contains a list of ids (or uids) of emails in that folder that are new (do not have the \\Seen flag) 如果在邮箱(INBOX)上执行IMAP选择,则会得到一个“标志” UNSEEN ,其中包含该文件夹中新的电子邮件ID(或uid)的列表(没有\\Seen标志)
  • In my test case, if you fetch say headers for a message with BODY.PEEK , then \\Seen on a message is not set; 在我的测试用例中,如果您使用BODY.PEEK获取消息的BODY.PEEK ,则未设置\\Seen on a message; if you fetch them with BODY , then \\Seen is set 如果您使用BODY获取它们,则\\Seen被设置
  • In my test case, also fetching (RFC822) doesn't set \\Seen (unlike your case with Gmail) 在我的测试案例中,“抓取(RFC822)也未设置\\Seen (与您的Gmail案例不同)

In the test case, I try to do pprint.pprint(inspect.getmembers(mail)) (in lieu of your dumpobj(uid, mail) ) - but only after I'm certain \\Seen has been set. 在测试用例中,我尝试执行pprint.pprint(inspect.getmembers(mail)) (代替您的dumpobj(uid, mail) )-但仅在确定\\Seen被设置之后才可以。 The output I get is posted in mail_object_inspect.txt - and as far as I can see, there is no mention of 'new/read/seen' etc. in none of the readable fields; 我得到的输出发布在mail_object_inspect.txt中 -据我所知,在所有可读字段中都没有提到“新/读/看过”等内容。 furthermore mail.as_string() prints: 此外, mail.as_string()打印:

'From: jesse@example.com\nTo: user@example.com\nSubject: This is a test message!\n\nHello. I am executive assistant to the director of\nBear Stearns, a failed investment Bank.  I have\naccess to USD6,000,000. ...\n'

Even worse, there is no mention of "fields" anywhere in the imaplib code (below filenames are printed if they do not contain case-insensitive "field" anywhere): 更糟糕的是,在imaplib代码中的任何地方都没有提到“字段”(如果文件名在任何地方都不包含不区分大小写的“字段”,则在文件名下方显示):

$ grep -L -i field /usr/lib/python{2.7,3.2}/imaplib.py
/usr/lib/python2.7/imaplib.py
/usr/lib/python3.2/imaplib.py

... so I guess that information was not saved with your dumps. ...所以我想信息没有保存在您的转储中。


Here is a bit on reconstructing the test case. 这是关于重构测试用例的一些内容。 The hardest was to find a small IMAP server, that can be quickly ran with some arbitrary users and emails, but without having to install a ton of stuff on your system. 最困难的是找到一个小型的IMAP服务器,该服务器可以快速运行一些任意的用户和电子邮件,而无需在系统上安装大量的资源。 Finally I found one: trivial-server.pl , the example file of Perl's Net::IMAP::Server ; 最后,我找到了一个: trivial-server.pl ,它是Perl的Net :: IMAP :: Server的示例文件; tested on Ubuntu 11.04. 在Ubuntu 11.04上测试。

The test case is pasted in this gist , with two files (with many comments) that I'll try to post abridged: 测试用例粘贴在这个要点中 ,其中有两个文件(带有很多注释),我将尝试将它们删节发布:

  • trivial-serverB.pl - Perl (v5.10.1) Net::IMAP::Server server (has a terminal output paste at end of file with a telnet client session) trivial-serverB.pl - Perl的(v5.10.1) Net::IMAP::Server服务器(具有在与Telnet客户端会话文件的末尾端子输出粘贴)
  • testimap.py - Python 2.7/3.2 imaplib testimap.py -Python 2.7 / 3.2 imaplib
    client (has a terminal output paste at end of file, of itself operating with the server) 客户端(在文件末尾具有终端输出粘贴,其自身与服务器一起运行)

trivial-serverB.pl trivial-serverB.pl

First, make sure you have Net::IMAP::Server - note, it has many dependencies, so the below command may take a while to install: 首先,请确保您具有Net::IMAP::Server注意,它具有许多依赖性,因此以下命令可能需要一段时间才能安装:

sudo perl -MCPAN -e 'install Net::IMAP::Server'

Then, in the directory where you got trivial-serverB.pl , create a subdirectory with SSL certificates: 然后,在您获得trivial-serverB.pl的目录中,创建一个包含SSL证书的子目录:

mkdir certs
openssl req \
  -x509 -nodes -days 365 \
  -subj '/C=US/ST=Oregon/L=Portland/CN=localhost' \
  -newkey rsa:1024 -keyout certs/server-key.pem -out certs/server-cert.pem

Finally run the server with administrative properties: 最后,使用管理属性运行服务器:

sudo perl trivial-serverB.pl

Note that the trivial-serverB.pl has a hack which will let a client to connect without SSL. 请注意, trivial-serverB.pl有一个hack,可让客户端在没有SSL的情况下进行连接。 Here is trivial-serverB.pl : 这是trivial-serverB.pl

#!/usr/bin/perl

use v5.10.1;
use feature qw(say);
use Net::IMAP::Server;

package Demo::IMAP::Hack;
$INC{'Demo/IMAP/Hack.pm'} = 1;

sub capabilityb {
  my $self = shift;
  print STDERR "Capabilitin'\n";
  my $base = $self->server->capability;
  my @words = split " ", $base;
  @words = grep {$_ ne "STARTTLS"} @words
    if $self->is_encrypted;
  unless ($self->auth) {
    my $auth = $self->auth || $self->server->auth_class->new;
    my @auth = $auth->sasl_provides;
    # hack:
    #unless ($self->is_encrypted) {
    #  # Lack of encrpytion makes us turn off all plaintext auth
    #  push @words, "LOGINDISABLED";
    #  @auth = grep {$_ ne "PLAIN"} @auth;
    #}
    push @words, map {"AUTH=$_"} @auth;
  }
  return join(" ", @words);
}

package Demo::IMAP::Auth;
$INC{'Demo/IMAP/Auth.pm'} = 1;
use base 'Net::IMAP::Server::DefaultAuth';
sub auth_plain {
    my ( $self, $user, $pass ) = @_;
    # XXX DO AUTH CHECK
    $self->user($user);
    return 1;
}

package Demo::IMAP::Model;
$INC{'Demo/IMAP/Model.pm'} = 1;
use base 'Net::IMAP::Server::DefaultModel';
sub init {
    my $self = shift;
    $self->root( Demo::IMAP::Mailbox->new() );
    $self->root->add_child( name => "INBOX" );
}

###########################################
package Demo::IMAP::Mailbox;
use base qw/Net::IMAP::Server::Mailbox/;
use Data::Dumper;

my $data = <<'EOF';
From: jesse@example.com
To: user@example.com
Subject: This is a test message!

Hello. I am executive assistant to the director of
Bear Stearns, a failed investment Bank.  I have
access to USD6,000,000. ...
EOF
my $msg = Net::IMAP::Server::Message->new($data);
sub load_data {
    my $self = shift;
    $self->add_message($msg);
}
my %ports = ( port => 143, ssl_port => 993 );
$ports{$_} *= 10 for grep {$> > 0} keys %ports;

$myserv = Net::IMAP::Server->new(
    auth_class  => "Demo::IMAP::Auth",
    model_class => "Demo::IMAP::Model",
    user        => 'nobody',
    log_level   => 3, # at least 3 to output 'CONNECT TCP Peer: ...' message; 4 to output IMAP commands too
    %ports,
);

# apparently, this overload MUST be after the new?! here:
{
no strict 'refs';
*Net::IMAP::Server::Connection::capability = \&Demo::IMAP::Hack::capabilityb;
}

# https://stackoverflow.com/questions/27206371/printing-addresses-of-perl-object-methods
say " -", $myserv->can('validate'), " -", $myserv->can('capability'), " -", \&Net::IMAP::Server::Connection::capability, " -", \&Demo::IMAP::Hack::capabilityb;

$myserv->run();

testimap.py testimap.py

With the server above running in one terminal, in another terminal you can just do: 借助上面的服务器在一个终端上运行,在另一终端上,您可以执行以下操作:

python testimap.py

The code will simply read fields and content from the one (and only) message the server above presents, and will eventually restore (remove) the \\Seen field. 该代码将简单地从上面服务器显示的一条(也是唯一一条)消息中读取字段和内容,并最终还原(删除) \\Seen字段。

import sys
if sys.version_info[0] < 3: # python 2.7
  def uttc(x):
    return x
else:                       # python 3+
  def uttc(x):
    return x.decode("utf-8")
import imaplib
import email
import pprint,inspect

imap_user = 'nobody'
imap_password = 'whatever'
imap_server = 'localhost'
conn = imaplib.IMAP4(imap_server)
conn.debug = 3

try:
  (retcode, capabilities) = conn.login(imap_user, imap_password)
except:
  print(sys.exc_info()[1])
  sys.exit(1)

# not conn.select(readonly=1), else we cannot modify the \Seen flag later
conn.select() # Select inbox or default namespace
(retcode, messages) = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
  for num in uttc(messages[0]).split(' '):
    if not(num):
      print("No messages available: num is `{0}`!".format(num))
      break
    print('Processing message: {0}'.format(num))

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))

    print('Peeking headers, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(BODY.PEEK[HEADER])')
    pprint.pprint(data)

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))

    print('Get RFC822 body, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(RFC822)')
    mail = email.message_from_string(uttc(data[0][1]))
    #pprint.pprint(inspect.getmembers(mail))

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))

    print('Get headers, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(BODY[HEADER])') # note, FLAGS (\\Seen) is now in data, even if not explicitly requested!
    pprint.pprint(data)

    print('Get RFC822 body, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(RFC822)')
    mail = email.message_from_string(uttc(data[0][1]))
    pprint.pprint(inspect.getmembers(mail)) # this is in mail_object_inspect.txt
    pprint.pprint(mail.as_string())

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # Seen: OK .. ['1 (FLAGS (\\Seen))']
            "Seen" if isSeen else "NEW"))

    conn.select() # select again, to see flags server side
    # * OK [UNSEEN 0] # no more unseen messages (if there was only one msg in folder)

    print('Restoring flag to unseen/new, message: {0} '.format(num))
    ret, data = conn.store(num,'-FLAGS','\\Seen')
    if ret == 'OK':
      print("Set back to unseen; Got OK: {0}{1}{2}".format(data,'\n',30*'-'))
      print(mail)

      typ, data = conn.fetch(num,'(FLAGS)')
      isSeen = ( "Seen" in uttc(data[0]) )
      print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. [b'1 (FLAGS ())']
              "Seen" if isSeen else "NEW"))

conn.close()

References 参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM