简体   繁体   中英

A simple way to run a piece of python code in parallel?

I have this very simple python code:

Test = 1;

def para():
   while(True):
      if Test > 10:
         print("Test is bigger than ten");
      time.sleep(1);

para(); # I want this to start in parallel, so that the code below keeps executing without waiting for this function to finish

while(True):
   Test = random.randint(1,42);
   time.sleep(1);

   if Test == 42:
       break;

...#stop the parallel execution of the para() here (kill it)

..some other code here

Basically, I want to run the function para() in parallel to the other code, so that the code below it doesn't have to wait for the para() to end. However, I want to be able to access the current value of the Test variable, inside of the para() while it is running in parallel (as seen in the code example above). Later, when I decide, that I am done with the para() running in parallel, I would like to know how to kill it both from the main thread, but also from within the parallel-ly running para() itself (self-terminate).

I have read some tutorials on threading, but almost every tutorial approaches it differently, plus I had a trouble understanding some of it, so I would like to know, what is the easiest way to run a piece of code in parallel.

Thank you.

Okay, first, here is an answer to your question, verbatim and in the simplest possible way. After that, we answer a little more fully with two examples that show two ways to do this and share access to data between the main and parallel code.

import random

from threading import Thread
import time

Test = 1;
stop = False

def para():
   while not stop:
      if Test > 10:
         print("Test is bigger than ten");
      time.sleep(1);

# I want this to start in parallel, so that the code below keeps executing without waiting for this function to finish

thread = Thread(target=para)
thread.start()

while(True):
   Test = random.randint(1,42);
   time.sleep(1);

   if Test == 42:
       break;

#stop the parallel execution of the para() here (kill it)
stop = True
thread.join()

#..some other code here
print( 'we have stopped' )

And now, the more complete answer:

In the following we show two code examples (listed below) that demonstrate (a) parallel execution using the threading interface, and (b) using the multiprocessing interface. Which of these you choose to use, depends on what you are trying to do. Threading can be a good choice when the purpose of the second thread is to wait for I/O, and multiprocessing can be a good choice when the second thread is for doing cpu intensive calculations.

In your example, the main code changed a variable and the parallel code only examined the variable. Things are different if you want to change a variable from both, for example to reset a shared counter. So, we will show you how to do that also.

In the following example codes:

  1. The variables " counter " and " run " and " lock " are shared between the main program and the code executed in parallel.

  2. The function myfunc() , is executed in parallel. It loops over updating counter and sleeping, until run is set to false, by the main program.

  3. The main program loops over printing the value of counter until it reaches 5, at which point it resets the counter. Then, after it reaches 5 again, it sets run to false and finally, it waits for the thread or process to exit before exiting itself.

You might notice that counter is incremented inside of calls to lock.acquire() and lock.release() in the first example, or with lock in the second example.

Incrementing a counter comprises three steps, (1) reading the current value, (2) adding one to it, and then (3) storing the result back into the counter. The problem comes when one thread tries to set the counter at the same time that this is happening.

We solve this by having both the main program and the parallel code acquire a lock before they change the variable, and then release it when they are done. If the lock is already taken, the program or parallel code waits until it is released. This synchronizes their access to change the shared data, ie the counter. (Aside, see semaphore for another kind of synchronization).

With that introduction, here is the first example, which uses threads:

# Parallel code with shared variables, using threads
from threading import Lock, Thread
from time import sleep

# Variables to be shared across threads
counter = 0
run = True
lock = Lock()

# Function to be executed in parallel
def myfunc():

    # Declare shared variables
    global run
    global counter
    global lock

    # Processing to be done until told to exit
    while run:
        sleep( 1 )

        # Increment the counter
        lock.acquire()
        counter = counter + 1
        lock.release()

    # Set the counter to show that we exited
    lock.acquire()
    counter = -1
    lock.release()
    print( 'thread exit' )

# ----------------------------

# Launch the parallel function as a thread
thread = Thread(target=myfunc)
thread.start()

# Read and print the counter
while counter < 5:
    print( counter )
    sleep( 1 )

# Change the counter    
lock.acquire()
counter = 0
lock.release()

# Read and print the counter
while counter < 5:
    print( counter )
    sleep( 1 )
    
# Tell the thread to exit and wait for it to exit
run = False
thread.join()

# Confirm that the thread set the counter on exit
print( counter )

And here is the second example, which uses multiprocessing. Notice that there are some extra steps involved to access the shared variables.

from time import sleep
from multiprocessing import Process, Value, Lock

def myfunc(counter, lock, run):
    
    while run.value:
        sleep(1)
        with lock:
            counter.value += 1
            print( "thread %d"%counter.value )

    with lock:
        counter.value = -1
        print( "thread exit %d"%counter.value )

# =======================

counter = Value('i', 0)
run = Value('b', True)
lock = Lock()

p = Process(target=myfunc, args=(counter, lock, run))
p.start()

while counter.value < 5:
    print( "main %d"%counter.value )
    sleep(1)

with lock:
    counter.value = 0
    
while counter.value < 5:
    print( "main %d"%counter.value )
    sleep(1)

run.value = False

p.join()

print( "main exit %d"%counter.value)

Rather than manually starting threads, much better just use multiprocessing.pool. The multiprocessing part needs to be in a function that you call with map. Instead of map you can then use pool.imap.

import multiprocessing
import time
def func(x):
    time.sleep(x)
    return x + 2

if __name__ == "__main__":    
    p = multiprocessing.Pool()
    start = time.time()
    for x in p.imap(func, [1,5,3]):
        print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))

Also check out: multiprocessing.Pool: What's the difference between map_async and imap?

Also worth checking out is functools.partials which can be used to pass in multiple variables (in addition to the list).

Another trick: sometimes you don't l really need multiprocessing (as in multiple cores of your processor), but just multiple threads to concurrently query a database with many connections at the same time. In that case just do from multiprocessing.dummy import Pool and can avoid python from spawning a separate process (which makes you lose access to all the namespaces you don't pass into the function), but keep all the benefits of a pool, just in a single cpu core. That's all you need to know about python multi processing (using multiple cores) and multithreading (using just one process and keeping the global interpreter lock intact).

Another little advice: always try to use map first without any pools. Then switch to pool.imap in the next step once you're sure it all works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM