Child process hangs, preventing main process to terminate

Question

Good afternoon,

I am trying to parallelize a linear programming solving scheme, code is partially reproduced below. The solving method make use of the PuLP library, which uses subprocesses to run solver commands.

from collections import OrderedDict
from time import time
from multiprocessing import Queue, Process
from queue import Empty
from os import getpid, path, mkdir
import sys

SOLVER = None
NUMBER_OF_PROCESSES = 12
# other parameters

def choose_solver():
    """Choose an initial solver"""
    if SOLVER == "CHOCO":
        solver = plp.PULP_CHOCO_CMD()
    elif SOLVER == "GLPK":
        solver = plp.GLPK_CMD(msg=0)
    elif SOLVER == "GUROBI":
        solver = plp.GUROBI_CMD(msg=0)
    else:
        solver = plp.PULP_CBC_CMD(msg=0)
    return solver
# other functions that are not multiprocess relevant

def is_infeasible(status):
    """Wrapper around PulP infeasible status"""
    return status in (plp.LpStatusInfeasible, plp.LpStatusUndefined)

def feasible_problems(input_var, output_var, initial_problem, solver):
    """Perform LP solving on a initial
    problem, return the feasible ones"""
    input_gt = input_var - TOL >= 0
    input_lt = input_var + TOL <= 0
    output_eq_input = (output_var - input_var == 0)
    output_eq_zero = (output_var == 0)
    problem_a = initial_problem.deepcopy()
    problem_a += input_gt
    problem_a += output_eq_input
    problem_b = initial_problem.deepcopy()
    problem_b += input_lt
    problem_b += output_eq_zero
    problem_a.solve(solver)
    problem_b.solve(solver)
    status_act = problem_a.status
    status_inact = problem_b.status
    if is_infeasible(status_act):
        return (problem_b,)
    else:
        if is_infeasible(status_inact):
            return (problem_a,)
        else:
            return (problem_a, problem_b)

def worker(q, r, start_problem, start_idx, to_check):
    """Worker spawned in a new process.
    Iterates over the neuron expression list.
    Sends a new job to the tasks queue if two activations are available.
    """
    problem = start_problem
    solver = choose_solver()
    for idx in range(start_idx, len(to_check) + 1):
        if idx == len(to_check):
            r.put_nowait(problem)
        else:
            output_var, input_var = to_check[idx]
            pbs = feasible_problems(input_var, output_var, problem, solver)
            if len(pbs) == 1:
                problem = pbs[0]
            elif len(pbs) == 2:
                q.put_nowait((idx+1, pbs[0]))
                problem = pbs[1]

def overseer(init_prob, neuron_exprs):
    """Running in the initial process,
    this function create tasks and results queues,
    maintain the number of current running processes
    and spawn new processes when there is enough resources
    for them to run.
    """
    tasks = Queue()
    results = Queue()
    working_processes = {}
    init_p = Process(target=worker,
            args=(tasks, results, init_prob, 0, neuron_exprs))
    init_p.start()
    working_processes[init_p.pid] = init_p
    res_list = []
    while len(working_processes) > 0:
        if len(working_processes) <= NUMBER_OF_PROCESSES:
            # if there is enough room in the working queue,
            # spawn a new process and add it
            try:
                (idx, problem) = tasks.get(timeout=1)
            except Empty:
                break
            proc = Process(target=worker, args=(tasks,
                results, problem, idx, neuron_exprs))
            proc.start()
            working_processes[proc.pid] = proc
        to_del = []
        for pid in working_processes:
            pwork = working_processes[pid]
            pwork.join(timeout=0)
            if pwork.exitcode is not None:
                to_del.append(pid)
        for pid in to_del:
            #deleting working process
            del working_processes[pid]
    results.join_thread()
    for i in range(results.qsize()):
        elt = results.get()
        res_list.append(elt)
    return res_list

def test_multi(init_prob, neuron_exprs):
    print("Testing multi process mode")
    now = time()
    init_prob, exprs = #some function that calculate those
    res = overseer(init_prob, exprs)
    print("Time spent: {:.4f}s".format(time()-now))
    for idx, problem in enumerate(res):
        if not path.exists("results"):
            mkdir("results")
        problem.writeLP("results/"+str(idx))

if __name__ == '__main__':
    torch_model = read_model(MODEL_PATH)
    print("Number of neurons: ", count_neurons(torch_model))
    print("Expected number of facets: ",
            theoretical_number(torch_model, DIM_INPUT))
    prob, to_check, hp, var_dict = init_problem(torch_model)
    test_multi(prob, to_check)

In my worker , I perform some costly calculations that may result in two different problems; if that happens, I send one problem to a tasks queue while keeping the other for the current worker process. My overseer take a task in the queue and launch a process when it can.

to_check is a list of PuLP expressions,

What I want to do is to fill the working_processes dictionnary with processes that are actually running, then look for their results at each iteration and remove those who have finished. The expected behaviour would be to keep spawning new processes when old ones terminates, which does not seem to be the case. However here I am indefinitely hanging: I successfully take the tasks in the queue, but my program hangs when I spawn more than NUMBER_OF_PROCESSES .

I'm quite new to multiprocessing, so there is maybe something wrong with how I'm doing it. Does anyone have any idea?

Answer 1

Take a look at the ProcessPoolExecutor from concurrent.futures .

Executor objects allow you to specify a pool of workers with a capped size. You can submit all your jobs simultaneously and the executors run through them picking up new jobs as old ones are completed.

Child process hangs, preventing main process to terminate

Question

1 answers

solution1
0 2021-02-28 14:22:40

Child process hangs, preventing main process to terminate

Question

1 answers

solution1 0 2021-02-28 14:22:40

solution1
0 2021-02-28 14:22:40