简体   繁体   English

使用Windows python 2.3进行的poplib重构

[英]Incoming poplib refactoring using windows python 2.3

Hi Guys could you please help me refactor this so that it is sensibly pythonic. 嗨,伙计们,请你帮我重构一下,这样才能让它变得更加诡异。

import sys
import poplib
import string
import StringIO, rfc822
import datetime
import logging

def _dump_pop_emails(self):
    self.logger.info("open pop account %s with username: %s" % (self.account[0], self.account[1]))
    self.popinstance = poplib.POP3(self.account[0])
    self.logger.info(self.popinstance.getwelcome()) 
    self.popinstance.user(self.account[1])
    self.popinstance.pass_(self.account[2])
    try:
        (numMsgs, totalSize) = self.popinstance.stat()
        for thisNum in range(1, numMsgs+1):
            (server_msg, body, octets) = self.popinstance.retr(thisNum)
            text = string.join(body, '\n')
            mesg = StringIO.StringIO(text)                               
            msg = rfc822.Message(mesg)
            name, email = msg.getaddr("From")
            emailpath = str(self._emailpath + self._inboxfolder + "\\" + email + "_" + msg.getheader("Subject") + ".eml")
            emailpath = self._replace_whitespace(emailpath)
            file = open(emailpath,"wb")
            file.write(text)
            file.close()
            self.popinstance.dele(thisNum)
    finally:
        self.logger.info(self.popinstance.quit())

def _replace_whitespace(self,name):
    name = str(name)
    return name.replace(" ", "_")   

Also in the _replace_whitespace method I would like to have some kind of cleaning routine which takes out all illegal characters which could cause processing. 同样在_replace_whitespace方法中,我希望有一些清理程序,它可以取出所有可能导致处理的非法字符。

Basically I want to write the email to the inbox directory in a standard way. 基本上我想以标准方式将电子邮件写入收件箱目录。

Am i doing something wrong here? 我在这里做错了吗?

I don't see anything significant wrong with that code -- is it behaving incorrectly, or are you just looking for general style guidelines? 我没有看到该代码有任何重大错误 - 它是否表现不正确,或者您只是在寻找一般风格指南?

A few notes: 几点说明:

  1. Instead of logger.info ("foo %s %s" % (bar, baz)) , use "foo %s %s", bar, baz . 而不是logger.info ("foo %s %s" % (bar, baz)) ,使用"foo %s %s", bar, baz This avoids the overhead of string formatting if the message won't be printed. 如果不打印消息,这可以避免字符串格式化的开销。
  2. Put a try...finally around opening emailpath . try...finally打开emailpath
  3. Use '\\n'.join (body) , instead of string.join (body, '\\n') . 使用'\\n'.join (body) ,而不是string.join (body, '\\n')
  4. Instead of msg.getaddr("From") , just msg.From . 而不是msg.getaddr("From") ,只是msg.From

This isn't refactoring (it doesn't need refactoring as far as I can see), but some suggestions: 这不是重构(据我所知,它不需要重构),但有些建议:

You should use the email package rather than rfc822. 您应该使用电子邮件包而不是rfc822。 Replace rfc822.Message with email.Message, and use email.Utils.parseaddr(msg["From"]) to get the name and email address, and msg["Subject"] to get the subject. 用email.Message替换rfc822.Message,并使用email.Utils.parseaddr(msg [“From”])获取名称和电子邮件地址,并使用msg [“Subject”]来获取主题。

Use os.path.join to create the path. 使用os.path.join创建路径。 This: 这个:

emailpath = str(self._emailpath + self._inboxfolder + "\\" + email + "_" + msg.getheader("Subject") + ".eml")

Becomes: 变为:

emailpath = os.path.join(self._emailpath + self._inboxfolder, email + "_" + msg.getheader("Subject") + ".eml")

(If self._inboxfolder starts with a slash or self._emailpath ends with one, you could replace the first + with a comma also). (如果self._inboxfolder以斜杠开头,或者self._emailpath以1开头,则可以用逗号替换第一个+)。

It doesn't really hurt anything, but you should probably not use "file" as a variable name, since it shadows a built-in type (checkers like pylint or pychecker would warn you about that). 它并没有真正伤害任何东西,但你可能不应该使用“file”作为变量名,因为它会影响内置类型(像pylint或pychecker这样的检查器会警告你)。

If you're not using self.popinstance outside of this function (seems unlikely given that you connect and quit within the function), then there's no point making it an attribute of self. 如果你没有在这个函数之外使用self.popinstance(假设你在函数中连接和退出似乎不太可能),那么就没有必要将它作为self的一个属性。 Just use "popinstance" by itself. 只需使用“popinstance”。

Use xrange instead of range. 使用xrange而不是range。

Instead of just importing StringIO, do this: 而不是只导入StringIO,执行以下操作:

try:
    import cStringIO as StringIO
except ImportError:
    import StringIO

If this is a POP mailbox that can be accessed by more than one client at a time, you might want to put a try/except around the RETR call to continue on if you can't retrieve one message. 如果这是一个可以由多个客户端一次访问的POP邮箱,则可能需要在RETR调用周围放置try / except,以便在无法检索一条消息时继续。

As John said, use "\\n".join rather than string.join, use try/finally to only close the file if it is opened, and pass the logging parameters separately. 正如约翰所说,使用“\\ n”.join而不是string.join,使用try / finally仅在文件打开时关闭文件,并分别传递日志记录参数。

The one refactoring issue I could think of would be that you don't really need to parse the whole message, since you're just dumping a copy of the raw bytes, and all you want is the From and Subject headers. 我能想到的一个重构问题是你不需要解析整个消息,因为你只是转储原始字节的副本,而你想要的只是From和Subject标题。 You could instead use popinstance.top(0) to get the headers, create the message (blank body) from that, and use that for the headers. 您可以使用popinstance.top(0)来获取标题,从中创建消息(空白正文),并将其用于标题。 Then do a full RETR to get the bytes. 然后执行完整的RETR来获取字节。 This would only be worth doing if your messages were large (and so parsing them took a long time). 如果您的消息很大(这样解析它们需要很长时间),这只会值得做。 I would definitely measure before I made this optimisation. 在进行此优化之前,我肯定会测量。

For your function to sanitise for the names, it depends how nice you want the names to be, and how certain you are that the email and subject make the filename unique (seems fairly unlikely). 对于您的名称清理功能,它取决于您希望名称有多好,以及您确定电子邮件和主题使文件名唯一(看起来不太可能)。 You could do something like: 你可以这样做:

emailpath = "".join([c for c in emailpath if c in (string.letters + string.digits + "_ ")])

And you'd end up with just alphanumeric characters and the underscore and space, which seems like a readable set. 而你最终只会使用字母数字字符和下划线和空格,这看起来像一个可读的集合。 Given that your filesystem (with Windows) is probably case insensitive, you could lowercase that also (add .lower() to the end). 鉴于您的文件系统(使用Windows)可能不区分大小写,您也可以小写(在末尾添加.lower())。 You could use emailpath.translate if you want something more complex. 如果你想要更复杂的东西,你可以使用emailpath.translate。

Further to my comment on John's answer 继续我对约翰回答的评论

I found out what the issue was, there were illegal characters in the name field and Subject field, which caused python to get the hiccups, as it tried to write the email as a directory, after seeing ":" and "/". 我发现问题是什么,名称字段和主题字段中存在非法字符,这导致python在看到“:”和“/”之后尝试将电子邮件编写为目录而导致打嗝。

John point number 4 doesnt work! 约翰点4号不起作用! so I left it as before. 所以我像以前一样离开了。 Also is point no 1 correct, have I implemented your suggestion correctly? 也是第1点正确,我是否正确实施了您的建议?

def _dump_pop_emails(self):
    self.logger.info("open pop account %s with username: %s", self.account[0], self.account[1])
    self.popinstance = poplib.POP3(self.account[0])
    self.logger.info(self.popinstance.getwelcome()) 
    self.popinstance.user(self.account[1])
    self.popinstance.pass_(self.account[2])
    try:
        (numMsgs, totalSize) = self.popinstance.stat()
        for thisNum in range(1, numMsgs+1):
            (server_msg, body, octets) = self.popinstance.retr(thisNum)
            text = '\n'.join(body)
            mesg = StringIO.StringIO(text)                               
            msg = rfc822.Message(mesg)
            name, email = msg.getaddr("From")
            emailpath = str(self._emailpath + self._inboxfolder + "\\" + self._sanitize_string(email + " " + msg.getheader("Subject") + ".eml"))
            emailpath = self._replace_whitespace(emailpath)
            print emailpath
            file = open(emailpath,"wb")
            file.write(text)
            file.close()
            self.popinstance.dele(thisNum)
    finally:
        self.logger.info(self.popinstance.quit())

def _replace_whitespace(self,name):
    name = str(name)
    return name.replace(" ", "_")   

def _sanitize_string(self,name):
    illegal_chars = ":", "/", "\\"
    name = str(name)
    for item in illegal_chars:
        name = name.replace(item, "_")
    return name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM