简体   繁体   中英

Does the dill python module handle importing modules when sys.path differs?

I'm evaluating dill and I want to know if this scenario is handled. I have a case where I successfully import a module in a python process. Can I use dill to serialize and then load that module in a different process that has a different sys.path which doesn't include that module? Right now I get import failures but maybe I'm doing something wrong.

Here's an example. I run this script where the foo.py module's path is in my sys.path:

% cat dill_dump.py 
import dill
import foo
myFile = "./foo.pkl"
fh = open(myFile, 'wb')
dill.dump(foo, fh)

Now, I run this script where I do not have foo.py's directory in my PYTHONPATH:

% cat dill_load.py 
import dill
myFile = "./foo.pkl"
fh = open(myFile, 'rb')
foo = dill.load(fh)
print foo

It fails with this stack trace:

Traceback (most recent call last):
  File "dill_load.py", line 4, in <module>
    foo = dill.load(fh)
  File "/home/b/lib/python/dill-0.2.4-py2.6.egg/dill/dill.py", line 199, in load
    obj = pik.load()
  File "/rel/lang/python/2.6.4-8/lib/python2.6/pickle.py", line 858, in load
    dispatch[key](self)
  File "/rel/lang/python/2.6.4-8/lib/python2.6/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "/home/b/lib/python/dill-0.2.4-py2.6.egg/dill/dill.py", line 678, in _import_module
    return __import__(import_name)
ImportError: No module named foo

So, if I need to have the same python path between the two processes, then what's the point of serializing a python module? Or in other words, is there any advantage to loading foo via dill over just having an "import foo" call?

That's an interesting failure. Notice that if you do dill.dumps(foo) you will get the contents of the module foo … the part that fails is using python's built-in import hook ( __import__ ) to do little more than to register the module into sys.modules . It should be possible to work around that and modify dill so that the module could be imported if the module is not found in the PYTHONPATH. However, I do think it's proper that the module have to be found in the PYTHONPATH… that is what is expected of a module… so I'm not sure if it's a good idea. But it might be...

As noted above, for a file foo.py , with contents: hello = "hello world, I am foo"

>>> import dill
>>> import foo
>>> dill.dumps(foo)
'\x80\x02cdill.dill\n_import_module\nq\x00U\x03fooq\x01\x85q\x02Rq\x03}q\x04(U\x08__name__q\x05h\x01U\x08__file__q\x06U\x06foo.pyq\x07U\x05helloq\x08U\x15hello world, I am fooq\tU\x07__doc__q\nNU\x0b__package__q\x0bNub.'

You can see the contents of the file are preserved in the pickle.

The primarily reason to use dill with modules, is that dill can record dynamic modifications to modules. For example, adding a function or other object:

>>> import foo 
>>> import dill
>>> foo.a = 100
>>> with open('foo.pkl', 'w') as f:
...   dill.dump(foo, f)
... 
>>> 

Then restarting… (with foo in the PYTHONPATH)

Python 2.7.10 (default, May 25 2015, 13:16:30) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('foo.pkl', 'r') as f:
...   foo = dill.load(f)
... 
>>> foo.hello
'hello world, I am foo'
>>> foo.a
100
>>> 

I've added this as a bug report / feature request: https://github.com/uqfoundation/dill/issues/123

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM