简体   繁体   English

从Python中的一个类似开放文件的对象中解析Mbox?

[英]Parsing Mbox from an open file-like object in Python?

This works: 这有效:

import mailbox

x = mailbox.mbox('filename.mbox')  # works

but what if I only have an open handle to the file, instead of a filename? 但是如果我只有一个打开的文件句柄而不是文件名怎么办?

fp = open('filename.mbox', mode='rb')  # for example; there are many ways to get a file-like object
x = mailbox.mbox(fp)  # doesn't work

Question : What's the best (cleanest, fastest) way to open Mbox from a bytes stream = an open binary handle, without copying the bytes into a named file first? 问题 :从字节流中打开Mbox的最佳(最干净,最快)方法是什么?一个开放的二进制句柄,而不是先将字节复制到命名文件中?

mailbox.mbox() has to call the builtin function open() at some point. mailbox.mbox()必须在某个时刻调用内置函数open() Thus a hacky solution would be to intercept that call and return the pre-existing file-like object. 因此,一个hacky解决方案是拦截该调用并返回预先存在的类文件对象。 A draft solution follows: 草案解决方案如下:

import builtins

# FLO stands for file-like object

class MboxFromFLO:

    def __init__(self, flo):
        original_open = builtins.open

        fake_path = '/tmp/MboxFromFLO'
        self.fake_path = fake_path
        def open_proxy(*args):
            print('open_proxy{} was called:'.format(args))
            if args[0] == fake_path:
                print('Call to open() was intercepted')
                return flo
            else:
                print('Call to open() was let through')
                return original_open(*args)

        self.original_open = original_open
        builtins.open = open_proxy
        print('Instrumenting open()')

    def __enter__(self):
        return mailbox.mbox(self.fake_path)

    def __exit__(self, exc_type, exc_value, traceback):
        print('Restoring open()')
        builtins.open = self.original_open



# Demonstration
import mailbox

# Create an mbox file so that we can use it later
b = mailbox.mbox('test.mbox')
key = b.add('This is a MboxFromFLO test message')

f = open('test.mbox', 'rb')
with MboxFromFLO(f) as b:
    print('Msg#{}:'.format(key), b.get(key))

Some caveats with regard to possible future changes in the implementation of mailbox.mbox : 关于mailbox.mbox实现可能的未来变化的一些警告:

  1. mailbox.mbox may also open extra files besides the one passed to its constructor. 除了传递给构造函数的文件之外, mailbox.mbox还可以打开额外的文件。 Even if it doesn't, the monkey-patched open() will be used by any other Python code executed while the patch is in effect (ie as long as the context managed by MboxFromFLO is active). 即使它没有,猴子修补的open()将被补丁生效时执行的任何其他Python代码使用(即,只要由MboxFromFLO管理的上下文处于活动状态)。 You must ensure that the fake path you generate (so that you can later recognize the correct call to open() if there are more than one such calls) doesn't conflict with any such files. 您必须确保生成的假路径(以便以后识别正确的open()调用open()如果有多个此类调用))不会与任何此类文件冲突。

  2. mailbox.mbox may decide to somehow check the specified path before opening it (eg using os.path.exists() , os.path.isfile() , etc) and will fail if that path doesn't exist. mailbox.mbox可能会在打开之前决定以某种方式检查指定的路径(例如使用os.path.exists()os.path.isfile()等),如果该路径不存在则会失败。

You could subclass mailbox.mbox. 你可以继承mailbox.mbox。 The source code for the standard library can be found on github. 可以在github上找到标准库的源代码。

The logic seems to be mostly implemented in the superclass _singlefileMailbox . 逻辑似乎主要在超类_singlefileMailbox

class _singlefileMailbox(Mailbox):
    """A single-file mailbox."""

    def __init__(self, path, factory=None, create=True):
        """Initialize a single-file mailbox."""
        Mailbox.__init__(self, path, factory, create)
        try:
            f = open(self._path, 'rb+')
        except OSError as e:
            if e.errno == errno.ENOENT:
                if create:
                    f = open(self._path, 'wb+')
                else:
                    raise NoSuchMailboxError(self._path)
            elif e.errno in (errno.EACCES, errno.EROFS):
                f = open(self._path, 'rb')
            else:
                raise
        self._file = f
        self._toc = None
        self._next_key = 0
        self._pending = False       # No changes require rewriting the file.
        self._pending_sync = False  # No need to sync the file
        self._locked = False
        self._file_length = None    # Used to record mailbox size

So we can try to get rid of the open() logic, and also replace the init code from mbox and the other superclasses. 所以我们可以尝试摆脱open()逻辑,并从mbox和其他超类中替换init代码。

class CustomMbox(mailbox.mbox):
    """A custom mbox mailbox from a file like object."""

    def __init__(self, fp, factory=None, create=True):
        """Initialize mbox mailbox from a file-like object."""

        # from `mailbox.mbox`
        self._message_factory = mailbox.mboxMessage

        # from `mailbox._singlefileMailbox`
        self._file = fp
        self._toc = None
        self._next_key = 0
        self._pending = False       # No changes require rewriting the file.
        self._pending_sync = False  # No need to sync the file
        self._locked = False
        self._file_length = None    # Used to record mailbox size

        # from `mailbox.Mailbox`
        self._factory = factory

    @property
    def _path(self):
        # If we try to use some functionality that relies on knowing 
        # the original path, raise an error.
        raise NotImplementedError('This class does not have a file path')

    def flush(self):
       """Write any pending changes to disk."""
       # _singlefileMailbox has quite complicated flush method.
       # Hopefully this will work fine.
       self._file.flush()

This could be a start. 这可能是一个开始。 But you would probably have to define additional methods to get the full functionality of the other mailbox classes. 但您可能必须定义其他方法才能获得其他邮箱类的完整功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM