简体   繁体   中英

how to use python multiprocessing properly?

I am running the following code. I was expecting the prints would be random between the processes. However, I see a deterministic result: on each run, first of all the first process is finishing it's loop, and only then, the second process starts to run the loop. I was expecting for a random behaviour, which means context switches between the 2 processes. But all I see is after one process is finished, the second one starts, without any context switches.

Can someone describe what am I missing?

import multiprocessing
import time
import os

lock = multiprocessing.Lock()


def func(_lock):
    for _ in range(0, 3):
        with _lock:
            print("sleeping in pid " + str(os.getpid()))
            time.sleep(1)
            print("finished sleeping in pid " + str(os.getpid()))


process1 = multiprocessing.Process(target=func, args=(lock,))
process2 = multiprocessing.Process(target=func, args=(lock,))
process1.start()
process2.start()

=============================================================

the output is:

sleeping in pid 2322

finished sleeping in pid 2322

sleeping in pid 2322

finished sleeping in pid 2322

sleeping in pid 2322

finished sleeping in pid 2322

sleeping in pid 2323

finished sleeping in pid 2323

sleeping in pid 2323

finished sleeping in pid 2323

sleeping in pid 2323

finished sleeping in pid 2323

Process finished with exit code 0

Here is an example using ThreadPoolExecutor. If you need Process then just change to ProcessPoolExecutor. To get know what to use(thread/process) you need to understand CPU Bound and I/O Bound

First, you create the dictionary class_holder and hold there YourClass object. Also, you put name of object in the queue queue.put(i) . And with each executor as far as you get name of queue queue.get() you run thread executor.submit(...) with calling method my_print with a random amount of seconds.

Hope this way of realization helps you. As for me, I find that way to scale project.

import os
import threading
import time
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import Queue
import random


class YourClass:
    def __init__(self):
        self.string = None
        self.sleep_time = None

    def my_print(self, string, sleep_t):
        self.string = string
        self.sleep_time = sleep_t
        time.sleep(self.sleep_time)
        print(self.string + str(threading.current_thread().ident) + " process id: " + str(os.getpid()))


lock = threading.Lock()
queue = Queue()

class_holder = dict()
for i in ['a', 'b', 'c', 'd', 'e', 'f']:
    class_holder[i] = YourClass()
    queue.put(i)

thread_limit = 3
with ThreadPoolExecutor(max_workers=thread_limit) as executor:
    while True:
        _i = queue.get()
        if _i in class_holder:
            executor.submit(class_holder[_i].my_print,
                            string=f"sleeping {_i} in thread id: ",
                            sleep_t=random.randint(1,4))

If you need lock = threading.Lock() then you can use it in YourClass methods to isolate such things as editing files.

class YourClass:

    ...

    def my_extra_method(self):
        with lock:
            os.system(fr"sed -i 's|ARG_IN_FIE|NEW_ARG|g' some_file")

I recommend you to use Queue, it helps to organize the job, and already has a lock. Check this examples of Queue:

class Queue(object):
    def __init__(self, size=5):
        self._size = size
        self._queue = []
        self._mutex = threading.RLock()
        self._empty = threading.Condition(self._mutex)
        self._full = threading.Condition(self._mutex)

    def put(self, val):
        with self._full:
            while len(self._queue) >= self._size:
                self._full.wait()
            self._queue.append(val)
            self._empty.notify()

    def get(self):
        with self._empty:
            while len(self._queue) == 0:
                self._empty.wait()
            ret = self._queue.pop(0)
            self._full.notify()
            return ret
from queue import Queue
from threading import Thread

def worker(q, n):
    while True:
        item = q.get()
        if item is None:
            break
        print("process data:", n, item)

q = Queue(5)
th1 = Thread(target=worker, args=(q, 1))
th2 = Thread(target=worker, args=(q, 2))
th1.start(); th2.start()
for i in range(50):
    q.put(i)
q.put(None); q.put(None)
th1.join(); th2.join()

Your process acquires a lock, "does its thing" while the other process is blocked, then releases the lock and immediately loops back and attempts to re-acquire the same lock it has just released. Since the process was already running and was still dispatchable, it is successful, ie just because it released the lock doesn't mean it automatically stops running and so it beats out the other process in the race to acquire the lock. Change the code to the following and you will get what I think you expected to see:

def func(_lock):
    for _ in range(0, 3):
        with _lock:
            print("sleeping in pid " + str(os.getpid()))
        time.sleep(1) # this gives the other process a chance to acquire the lock
        with _lock:
            print("finished sleeping in pid " + str(os.getpid()))

Locks should only be held for the shortest possible time. Try to come up with logic that allows that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM