简体   繁体   中英

Getting “ImportError: No module named” with parallel python and methods in a package

I'm trying to use parallel python in order to do some distributed benchmarking (essentially, coordinate and run some code on a set of machines from a central server). The code I had was working perfectly fine until I moved the functionality to a separate package. From then on, I keep getting ImportError: No module named some.module.pp_test .

My question is actually two-fold: has anyone ever came across this problem with pp , and if yes, how to solve it? I tried using dill ( import dill ), but didn't help. Also, is there a good replacement for parallelpython, that doesn't require any additional infrastructure?

The exact error I get is:

RUNNING TEST
Waiting for hosts to finish booting....A fatal error has occured during the function execution
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/ppworker.py", line 86, in run
    __args = pickle.loads(__sargs)
ImportError: No module named some.module.pp_test
Caught exception in the run phase 'NoneType' object is not iterable
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    p.ping_pong()
  File "/home/ubuntu/workspace/pp-test/some/module/pp_test.py", line 5, in ping_pong
    a_test.run()
  File "/home/ubuntu/workspace/pp-test/some/module/pp_test.py", line 27, in run
    pong, hostname = ping()
TypeError: 'NoneType' object is not iterable

The code is structured this way:

pp-test/
       test.py
       some/
            __init__.py
            module/
                   __init__.py
                   pp_test.py

The test.py is implemented as:

from some.module.pp_test import MWE

p = MWE()
p.ping_pong()

While pp_test.py is:

class MWE():
  def ping_pong(self):
    print "RUNNING TEST "
    a_test = PPTester()
    a_test.run()

import pp
import time
from sys import stdout, exit

class PPTester(object):
  def run(self):
    try:
        ppservers = ('10.10.10.10', )
        time.sleep(5)
        job_server = pp.Server(0, ppservers=ppservers)
        stdout.write("Waiting for hosts to finish booting...")
        while len(job_server.get_active_nodes()) - 1 < len(ppservers):
            stdout.write(".")
            stdout.flush()
            time.sleep(1)

        ppmodules = ()
        pings = [(server, job_server.submit(self.run_pong, modules=ppmodules)) for server in ppservers]
        for server, ping in pings:
            pong, hostname = ping()
            print "Host ", hostname, " is alive!"

        print "All servers booted up, starting benchmarks..."
        job_server.print_stats()
    except Exception as e:
        print "Caught exception in the run phase", e
        raise
    pass

  def run_pong(self):
    import subprocess
    p = subprocess.Popen("hostname", stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
    (output, err) = p.communicate()
    p_status = p.wait()

    return "pong ", output

dill won't work with pp out of the box, because pp doesn't serialize the python objects -- pp extracts the object's source code (like the inspect module in the standard python library).

To enable pp to use dill (actually dill.source , which is inspect augmented by dill ), you have to use a fork of pp called ppft . ppft installs as pp (ie imports with import pp ), but it has much stronger source inspection, so you can automatically "serialize" most python objects and have ppft track down their dependencies automatically.

Get ppft here: https://github.com/uqfoundation

ppft is also pip installable and python 3.x compatible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM