Python Pickling and multiprocessing

Question

I'm trying to use multiprocessing to get a handle on my memory issues, however I can't get a function to pickle, and I have no idea why. My main code starts with

def main():
    print "starting main"
    q = Queue()
    p = Process(target=file_unpacking,args=("hellow world",q))
    p.start()
    p.join()
    if p.is_alive():
        p.terminate()
    print "The results are in"
    Chan1 = q.get()
    Chan2 = q.get()
    Start_Header = q.get()
    Date = q.get()
    Time = q.get()
    return Chan1, Chan2, Start_Header, Date, Time

def file_unpacking(args, q):
    print "starting unpacking"
    fileName1 = "050913-00012"
    unpacker = UnpackingClass()
    for fileNumber in range(0,44):
        fileName = fileName1 + str(fileNumber) + fileName3
        header, data1, data2 = UnpackingClass.unpackFile(path,fileName)

        if header == None:
            logging.warning("curropted file found at " + fileName)
            Data_Sets1.append(temp_1)
            Data_Sets2.append(temp_2)
            Headers.append(temp_3)
            temp_1 = []
            temp_2 = []
            temp_3 = []
            #for i in range(0,10000):
            #    Chan1.append(0)
            #    Chan2.append(0)

        else:
            logging.info(fileName + " is good!")
            temp_3.append(header)
            for i in range(0,10000):
                temp_1.append(data1[i])
                temp_2.append(data2[i])

    Data_Sets1.append(temp_1)
    Data_Sets2.append(temp_2)
    Headers.append(temp_3)
    temp_1 = []
    temp_2 = []
    temp_3 = []

    lengths = []
    for i in range(len(Data_Sets1)):
        lengths.append(len(Data_Sets1[i]))
    index = lengths.index(max(lengths))

    Chan1 = Data_Sets1[index]
    Chan2 = Data_Sets2[index]
    Start_Header = Headers[index]
    Date = Start_Header[index][0]
    Time = Start_Header[index][1]
    print "done unpacking"
    q.put(Chan1)
    q.put(Chan2)
    q.put(Start_Header)
    q.put(Date)
    q.put(Time)

and currently I have the unpacking method in a separate python file that imports struct and os. This reads a part text part binary file, structures it, and then closes it. This is mostly leg work, so I won't post it yet, however if it helps I will. I will give the start

class UnpackingClass:
    def __init__(self):
        print "Unpacking Class"
    def unpackFile(path,fileName):
        import struct
        import os
    .......

Then I simply call main() to get the party started, and I get nothing but a infinite loop of pickle errors.

Long story short I don't have any clue how to pickle a function. Everything is defined at the top of files, so I'm at a loss.

Here is the error message

Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\multiprocessing\forking.py", line 373, in main
prepare(preparation_data)
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\multiprocessing\forking.py", line 488, in prepare
'__parents_main__', file, path_name, etc
 File "A:\598\TestCode\test1.py", line 142, in <module>
Chan1, Chan2, Start_Header, Date, Time = main()
 File "A:\598\TestCode\test1.py", line 43, in main
p.start()
  File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
  File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\multiprocessing\forking.py", line 271, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\multiprocessing\forking.py", line 193, in dump
ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 224, in dump
self.save(obj)
  File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 419, in save_reduce
save(state)
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 681, in _batch_setitems
save(v)
 File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "C:\Users\Casey\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.1.0.1371.win-x86_64\lib\pickle.py", line 748, in save_global
(obj, module, name))
pickle.PicklingError: Can't pickle <function file_unpacking at 0x0000000007E1F048>: it's    not found as __main__.file_unpacking

Answer 1

Pickling a function is a very very relevant thing to do if you want to do any parallel computing. Python's pickle and multiprocessing are pretty broken for doing parallel computing, so if you aren't adverse to going outside of the standard library, I'd suggest dill for serialization, and pathos.multiprocessing as a multiprocessing replacement. dill can serialize almost anything in python, and pathos.multiprocessing uses dill to provide more robust parallel CPU use. For more information, see:

What can multiprocessing and dill do together?

or this simple example:

Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> from pathos.multiprocessing import ProcessingPool
>>> 
>>> def squared(x):
...   return x**2
... 
>>> pool = ProcessingPool(4)
>>> pool.map(squared, range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> res = pool.amap(squared, range(10))
>>> res.get()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> res = pool.imap(squared, range(10))
>>> list(res)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> pool.map(add, range(10), range(10))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> res = pool.amap(add, range(10), range(10))
>>> res.get()
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> res = pool.imap(add, range(10), range(10))
>>> list(res)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Both dill and pathos are available here: https://github.com/uqfoundation

Answer 2

You can technically pickle a function. But, it's only a name reference that's being saved. When you unpickle, you must set up the environment so that the name reference makes sense to python. Make sure to read What can be pickled and unpicked carefully.

If this doesn't answer your question, you'll need to provide us with the exact error messages. Also, please explain the purpose of pickling a function. Since you can only pickle a name reference and not the function itself, why can't you simply import and call the corresponding code?

Python Pickling and multiprocessing

Question

2 answers

solution1
4 2014-03-22 17:23:41

solution2
0 2014-03-15 02:50:02

Python Pickling and multiprocessing

Question

2 answers

solution1 4 2014-03-22 17:23:41

solution2 0 2014-03-15 02:50:02

solution1
4 2014-03-22 17:23:41

solution2
0 2014-03-15 02:50:02