繁体   English   中英

从Python中的一个类似开放文件的对象中解析Mbox?

[英]Parsing Mbox from an open file-like object in Python?

这有效:

import mailbox

x = mailbox.mbox('filename.mbox')  # works

但是如果我只有一个打开的文件句柄而不是文件名怎么办?

fp = open('filename.mbox', mode='rb')  # for example; there are many ways to get a file-like object
x = mailbox.mbox(fp)  # doesn't work

问题 :从字节流中打开Mbox的最佳(最干净,最快)方法是什么?一个开放的二进制句柄,而不是先将字节复制到命名文件中?

mailbox.mbox()必须在某个时刻调用内置函数open() 因此,一个hacky解决方案是拦截该调用并返回预先存在的类文件对象。 草案解决方案如下:

import builtins

# FLO stands for file-like object

class MboxFromFLO:

    def __init__(self, flo):
        original_open = builtins.open

        fake_path = '/tmp/MboxFromFLO'
        self.fake_path = fake_path
        def open_proxy(*args):
            print('open_proxy{} was called:'.format(args))
            if args[0] == fake_path:
                print('Call to open() was intercepted')
                return flo
            else:
                print('Call to open() was let through')
                return original_open(*args)

        self.original_open = original_open
        builtins.open = open_proxy
        print('Instrumenting open()')

    def __enter__(self):
        return mailbox.mbox(self.fake_path)

    def __exit__(self, exc_type, exc_value, traceback):
        print('Restoring open()')
        builtins.open = self.original_open



# Demonstration
import mailbox

# Create an mbox file so that we can use it later
b = mailbox.mbox('test.mbox')
key = b.add('This is a MboxFromFLO test message')

f = open('test.mbox', 'rb')
with MboxFromFLO(f) as b:
    print('Msg#{}:'.format(key), b.get(key))

关于mailbox.mbox实现可能的未来变化的一些警告:

  1. 除了传递给构造函数的文件之外, mailbox.mbox还可以打开额外的文件。 即使它没有,猴子修补的open()将被补丁生效时执行的任何其他Python代码使用(即,只要由MboxFromFLO管理的上下文处于活动状态)。 您必须确保生成的假路径(以便以后识别正确的open()调用open()如果有多个此类调用))不会与任何此类文件冲突。

  2. mailbox.mbox可能会在打开之前决定以某种方式检查指定的路径(例如使用os.path.exists()os.path.isfile()等),如果该路径不存在则会失败。

你可以继承mailbox.mbox。 可以在github上找到标准库的源代码。

逻辑似乎主要在超类_singlefileMailbox

class _singlefileMailbox(Mailbox):
    """A single-file mailbox."""

    def __init__(self, path, factory=None, create=True):
        """Initialize a single-file mailbox."""
        Mailbox.__init__(self, path, factory, create)
        try:
            f = open(self._path, 'rb+')
        except OSError as e:
            if e.errno == errno.ENOENT:
                if create:
                    f = open(self._path, 'wb+')
                else:
                    raise NoSuchMailboxError(self._path)
            elif e.errno in (errno.EACCES, errno.EROFS):
                f = open(self._path, 'rb')
            else:
                raise
        self._file = f
        self._toc = None
        self._next_key = 0
        self._pending = False       # No changes require rewriting the file.
        self._pending_sync = False  # No need to sync the file
        self._locked = False
        self._file_length = None    # Used to record mailbox size

所以我们可以尝试摆脱open()逻辑,并从mbox和其他超类中替换init代码。

class CustomMbox(mailbox.mbox):
    """A custom mbox mailbox from a file like object."""

    def __init__(self, fp, factory=None, create=True):
        """Initialize mbox mailbox from a file-like object."""

        # from `mailbox.mbox`
        self._message_factory = mailbox.mboxMessage

        # from `mailbox._singlefileMailbox`
        self._file = fp
        self._toc = None
        self._next_key = 0
        self._pending = False       # No changes require rewriting the file.
        self._pending_sync = False  # No need to sync the file
        self._locked = False
        self._file_length = None    # Used to record mailbox size

        # from `mailbox.Mailbox`
        self._factory = factory

    @property
    def _path(self):
        # If we try to use some functionality that relies on knowing 
        # the original path, raise an error.
        raise NotImplementedError('This class does not have a file path')

    def flush(self):
       """Write any pending changes to disk."""
       # _singlefileMailbox has quite complicated flush method.
       # Hopefully this will work fine.
       self._file.flush()

这可能是一个开始。 但您可能必须定义其他方法才能获得其他邮箱类的完整功能。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM