简体   繁体   中英

Parsing Mbox from an open file-like object in Python?

This works:

import mailbox

x = mailbox.mbox('filename.mbox')  # works

but what if I only have an open handle to the file, instead of a filename?

fp = open('filename.mbox', mode='rb')  # for example; there are many ways to get a file-like object
x = mailbox.mbox(fp)  # doesn't work

Question : What's the best (cleanest, fastest) way to open Mbox from a bytes stream = an open binary handle, without copying the bytes into a named file first?

mailbox.mbox() has to call the builtin function open() at some point. Thus a hacky solution would be to intercept that call and return the pre-existing file-like object. A draft solution follows:

import builtins

# FLO stands for file-like object

class MboxFromFLO:

    def __init__(self, flo):
        original_open = builtins.open

        fake_path = '/tmp/MboxFromFLO'
        self.fake_path = fake_path
        def open_proxy(*args):
            print('open_proxy{} was called:'.format(args))
            if args[0] == fake_path:
                print('Call to open() was intercepted')
                return flo
            else:
                print('Call to open() was let through')
                return original_open(*args)

        self.original_open = original_open
        builtins.open = open_proxy
        print('Instrumenting open()')

    def __enter__(self):
        return mailbox.mbox(self.fake_path)

    def __exit__(self, exc_type, exc_value, traceback):
        print('Restoring open()')
        builtins.open = self.original_open



# Demonstration
import mailbox

# Create an mbox file so that we can use it later
b = mailbox.mbox('test.mbox')
key = b.add('This is a MboxFromFLO test message')

f = open('test.mbox', 'rb')
with MboxFromFLO(f) as b:
    print('Msg#{}:'.format(key), b.get(key))

Some caveats with regard to possible future changes in the implementation of mailbox.mbox :

  1. mailbox.mbox may also open extra files besides the one passed to its constructor. Even if it doesn't, the monkey-patched open() will be used by any other Python code executed while the patch is in effect (ie as long as the context managed by MboxFromFLO is active). You must ensure that the fake path you generate (so that you can later recognize the correct call to open() if there are more than one such calls) doesn't conflict with any such files.

  2. mailbox.mbox may decide to somehow check the specified path before opening it (eg using os.path.exists() , os.path.isfile() , etc) and will fail if that path doesn't exist.

You could subclass mailbox.mbox. The source code for the standard library can be found on github.

The logic seems to be mostly implemented in the superclass _singlefileMailbox .

class _singlefileMailbox(Mailbox):
    """A single-file mailbox."""

    def __init__(self, path, factory=None, create=True):
        """Initialize a single-file mailbox."""
        Mailbox.__init__(self, path, factory, create)
        try:
            f = open(self._path, 'rb+')
        except OSError as e:
            if e.errno == errno.ENOENT:
                if create:
                    f = open(self._path, 'wb+')
                else:
                    raise NoSuchMailboxError(self._path)
            elif e.errno in (errno.EACCES, errno.EROFS):
                f = open(self._path, 'rb')
            else:
                raise
        self._file = f
        self._toc = None
        self._next_key = 0
        self._pending = False       # No changes require rewriting the file.
        self._pending_sync = False  # No need to sync the file
        self._locked = False
        self._file_length = None    # Used to record mailbox size

So we can try to get rid of the open() logic, and also replace the init code from mbox and the other superclasses.

class CustomMbox(mailbox.mbox):
    """A custom mbox mailbox from a file like object."""

    def __init__(self, fp, factory=None, create=True):
        """Initialize mbox mailbox from a file-like object."""

        # from `mailbox.mbox`
        self._message_factory = mailbox.mboxMessage

        # from `mailbox._singlefileMailbox`
        self._file = fp
        self._toc = None
        self._next_key = 0
        self._pending = False       # No changes require rewriting the file.
        self._pending_sync = False  # No need to sync the file
        self._locked = False
        self._file_length = None    # Used to record mailbox size

        # from `mailbox.Mailbox`
        self._factory = factory

    @property
    def _path(self):
        # If we try to use some functionality that relies on knowing 
        # the original path, raise an error.
        raise NotImplementedError('This class does not have a file path')

    def flush(self):
       """Write any pending changes to disk."""
       # _singlefileMailbox has quite complicated flush method.
       # Hopefully this will work fine.
       self._file.flush()

This could be a start. But you would probably have to define additional methods to get the full functionality of the other mailbox classes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM