简体   繁体   中英

How to call a pool with sleep between executions within a multiprocessing process in Python?

In the main function, I am calling a process to run imp_workload() method parallely for each DP_WORKLOAD

#!/usr/bin/env python

import multiprocessing
import subprocess

if __name__ == "__main__":
    for DP_WORKLOAD in DP_WORKLOAD_NAME:
         p1 = multiprocessing.Process(target=imp_workload, args=(DP_WORKLOAD, DP_DURATION_SECONDS, DP_CONCURRENCY, ))
         p1.start()

However, inside this imp_workload() method, I need the import_command_run() method to run a number of processes (the number is equivalent to variable DP_CONCURRENCY) but with the sleep of 60 seconds before new execution. This is the sample code I have written.

def imp_workload(DP_WORKLOAD, DP_DURATION_SECONDS, DP_CONCURRENCY):
     while DP_DURATION_SECONDS > 0:
           pool = multiprocessing.Pool(processes = DP_CONCURRENCY)

           for j in range(DP_CONCURRENCY):
                pool.apply_async(import_command_run, args=(DP_WORKLOAD, dp_workload_cmd, j,)
                # Sleep for 1 minute
                time.sleep(60)

           pool.close()

           # Clean the schemas after import is completed
           clean_schema(DP_WORKLOAD)

           # Sleep for 1 minute
           time.sleep(60)

def import_command_run(DP_WORKLOAD):
    abccmd = 'impdp admin/DP_PDB_ADMIN_PASSWORD@DP_PDB_FULL_NAME SCHEMAS=ABC'
    defcmd = 'impdp admin/DP_PDB_ADMIN_PASSWORD@DP_PDB_FULL_NAME SCHEMAS=DEF'
    
    # any of the above commands
    run_imp_cmd(eval(dp_workload_cmd))

def run_imp_cmd(cmd):
    output = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
    stdout,stderr = output.communicate()
    return stdout

When I tried running it in this format, I got the following error:

    time.sleep(60)
       ^
SyntaxError: invalid syntax

So, how can I kickoff the 'abccmd' job for DP_CONCURRENCY times parallely with a sleep of 1 min between each job and also each of these pool running in multiProcess?

Working on Python 2.7.5 (Due to restrictions, can't use Python 3.x so, will appreciate answers specific to Python 2.x)

PS This is a very large script and complex file so I have tried posting only relevant excerpts. Please ask for more details if necessary (or if it is not clear from this much)

Let me offer two possibilities:

Possibility 1

Here is an example of how you would kick off a worker function in parallel with DP_CURRENCY == 4 possible arguments, 0 , 1 , 2 and 3 , cycling over and over for up to DP_DURATION_SECONDS seconds with a pool size of DP_CURRENCY and as soon as a job completes restarting the job but guaranteeing that at least TIME_BETWEEN_SUBMITS == 60 seconds has elapsed between successive restarts.

from __future__ import print_function
from multiprocessing import Pool
import time
from queue import SimpleQueue

TIME_BETWEEN_SUBMITS = 60

def worker(i):
    print(i, 'started at', time.time())
    time.sleep(40)
    print(i, 'ended at', time.time())
    return i # the argument

def main():
    q = SimpleQueue()

    def callback(result):
        # every time a job finishes, put result (the argument) on the queue
        q.put(result)

    DP_CURRENCY = 4
    DP_DURATION_SECONDS = TIME_BETWEEN_SUBMITS * 10
    pool = Pool(DP_CURRENCY)
    t = time.time()
    expiration = t + DP_DURATION_SECONDS
    # kick off initial tasks:
    start_times = [None] * DP_CURRENCY
    for i in range(DP_CURRENCY):
        pool.apply_async(worker, args=(i,), callback=callback)
        start_times[i] = time.time()
    while True:
        i = q.get() # wait for a job to complete
        t = time.time()
        if t >= expiration:
            break
        time_to_wait = TIME_BETWEEN_SUBMITS - (t - start_times[i])
        if time_to_wait > 0:
            time.sleep(time_to_wait)
        pool.apply_async(worker, args=(i,), callback=callback)
        start_times[i] = time.time()
    # wait for all jobs to complete:
    pool.close()
    pool.join()

# required by Windows:
if __name__ == '__main__':
    main()

Possibility 2

This is closer to what you had in that DP_DURATION_SECONDS == 60 seconds of sleeping is done between successive submission of any two jobs. But to me this doesn't make as much sense. If, for example, the worker function only took 50 seconds to complete, you would not be doing any parallel processing at all. In fact, each job would need to take at least 180 (ie (DP_CURRENCY - 1) * TIME_BETWEEN_SUBMITS ) seconds to complete in order to have all 4 processes in the pool busy running jobs at the same time.

from __future__ import print_function
from multiprocessing import Pool
import time
from queue import SimpleQueue

TIME_BETWEEN_SUBMITS = 60

def worker(i):
    print(i, 'started at', time.time())
    # A task must take at least 180 seconds to run to have 4 tasks running in parallel if
    # you wait 60 seconds between starting each successive task:
    # take 182 seconds to run
    time.sleep(3 * TIME_BETWEEN_SUBMITS + 2)
    print(i, 'ended at', time.time())
    return i # the argument

def main():
    q = SimpleQueue()

    def callback(result):
        # every time a job finishes, put result (the argument) on the queue
        q.put(result)

    # at most 4 tasks at a time but only if worker takes at least 3 * TIME_BETWEEN_SUBMITS
    DP_CURRENCY = 4
    DP_DURATION_SECONDS = TIME_BETWEEN_SUBMITS * 10
    pool = Pool(DP_CURRENCY)
    t = time.time()
    expiration = t + DP_DURATION_SECONDS
    # kick off initial tasks:
    for i in range(DP_CURRENCY):
        if i != 0:
            time.sleep(TIME_BETWEEN_SUBMITS)
        pool.apply_async(worker, args=(i,), callback=callback)
    time_last_job_submitted = time.time()
    while True:
        i = q.get() # wait for a job to complete
        t = time.time()
        if t >= expiration:
            break
        time_to_wait = TIME_BETWEEN_SUBMITS - (t - time_last_job_submitted)
        if time_to_wait > 0:
            time.sleep(time_to_wait)
        pool.apply_async(worker, args=(i,), callback=callback)
        time_last_job_submitted = time.time()
    # wait for all jobs to complete:
    pool.close()
    pool.join()

# required by Windows:
if __name__ == '__main__':
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM