简体   繁体   中英

Trying to understand python multithreading

Please consider this code:

import threading

def printer():
    for i in range(2):
        with lock:
            print ['foo', 'bar', 'baz']

def main():
    global lock
    lock = threading.Lock()
    threads = [threading.Thread(target=printer) for x in xrange(2)]
    for t in threads:
        t.start()
        t.join()

main()

I can understand this code and it is clear: We create two threads and we run them sequentially - we run second thread only when first thread is finished. Ok, now consider another variant:

import threading

def printer():
    for i in range(2):
        with lock:
            print ['foo', 'bar', 'baz']

def main():
    global lock
    lock = threading.Lock()
    threads = [threading.Thread(target=printer) for x in xrange(2)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()

main()

What happens here? Ok, we run them in parallel, but what is the purpose of make main thread waiting for child threads in second variant? How it can influence on the output?

In the second variant, the ordering of execution is much less defined. The lock is released each time through the loop in printer. In both variants, you have two threads and two loops within a thread. In the first variant, since only one thread runs at a time, you know the total ordering. In the second variant, each time the lock is released, the thread running may change. So you might get

  • thread 1 loop 1
  • thread 1 loop 2
  • thread 2 loop 1
  • thread 2 loop 2

or perhaps * thread 2 loop 1 * thread 1 loop 1 * thread 1 loop 2 * thread 2 loop 2

The only constraint is that loop1 within a given thread runs before loop 2 within that thread and that the two print statements come together since the lock is held for both of them.

In this particular case I'm not sure the call to t.join() in the second variant has an observable effect. It guarantees that the main thread will be the last thread to end, but I'm not sure that in this code you can observe that in any way. In more complex code, joining the threads can be important so that cleanup actions are only performed after all threads terminate. This can also be very important if you have daemon threads, because the entire program will terminate when all non-daemon threads terminate.

To better understand the multithreading in python, you need to first understand the relationship between the main thread and the children threads.

The main thread is the entry of the program, it is created by your system when you run your script. For example, in your script, the main function is run in the main thread.

While the children thread is created by your main thread when you instanate the Thread class.

The most important thing is how the main thread controls the children thread. Basically, the instance of the Thread is everything that the main thread know about and control over this child thread. At the time when a child thread is created, this child thread does not run immediately, until the main thread call start function on this thread instance. After the start the child thread, you can assume that the main thread and the child thread is running parallelly now.

But one more important thing is how the main thread knows that the task of child thread is done. Though the main thread knows nothing about how the task is done by the child thread, it does be aware of the running status of the child thread. Thread.is_alive can check the status of a thread by the main thread. In pratice, the Thread.join function is always used to tell the main thread wait until the child thread is done. This function will block the main thread.

Okay, let's examine the two script you are confused with. For the first script:

for t in threads:
    t.start()
    t.join()

The children threads in the loop are start ed and then join ed one by one. Note that start does not block main thread, while join will block the main thread wait until this child thread is done. Thus they are running sequentially.

While for the second script:

for t in threads:
    t.start()
for t in threads:
    t.join()

All children threads are started in the first loop. As the Thread.start function will not block the main thread, all children threadings are running parallelly after the first loop. In the second loop, the main thread will wait for the task done of each child thread one by one.

Now I think you should notice the difference between these two script: in the first one, children threads running one by one, while in the second script, they are running simultaneously.

There are other useful topics for the python threading:

(1) How to handle the Keyboard Interrupt Exception, eg, when I want to terminate the program by Ctrl-C ? Only the main thread will receive the exception, you have to handle the termination of children threads.

(2) Multithreading vs Multiprocessing. Although we are saying that threading is parallel, it is not the real parallel in CPU level. So if your application is CPU intensive, try multiprocessing, and if your application is I/O intensive, multithreading maybe sufficient.

By the way, read through the documentation of python threading section and try some code may help you understand it.

Hope this would be helpful. Thanks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM