简体   繁体   中英

How does Thread().join work in the following case?

I saw the following code in a thread tutorial:

from time import sleep, perf_counter
from threading import Thread

start = perf_counter()

def foo():
    sleep(5)

threads = []
for i in range(100):
    t = Thread(target=foo,)
    t.start()
    threads.append(t)

for i in threads:
    i.join()

end = perf_counter()

print(f'Took {end - start}')

When I run it it prints Took 5.014557975 . Okay, that part is fine. It does not take 500 seconds as the non threaded version would.

What I don't understand is how .join works. I noticed without calling .join I got Took 0.007060926999999995 which indicates that the main thread ended before the child threads. Since '.join()' is supposed to block, when the first iteration of the loop occurs won't it be blocked and have to wait 5 seconds till the second iteration? How does it still manage to run?

I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?

So '.join()' is supposed to block, so when the first iteration of the loop occurs wont it be blocked and it has to wait 5 seconds till the second iteration?

Remember all the threads are started at the same time and all of them take ~5s.

The second for loop waits for all the threads to finish. It will take roughly 5s for the first thread to finish, but the remaining 99 threads will finish roughly at the same time, and so will the remaining 99 iterations of the loop. By the time you're calling join() on the second thread, it is either already finished or will be within a couple of milliseconds.

I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?

It's a topic that has been discussed a lot, so I won't add another page-long answer.

Tl;dr: Yes, Python Multithreading doesn't help with CPU-intensive tasks, but it's just fine for tasks that spend a lot of time on waiting for something else (Network, Disk-I/O, user input, a time-based event).

sleep() belongs to the latter group of tasks, so Multithreading will speed it up, even though it doesn't utilize multiple cores simultaneously.

The OS is in control when the thread starts and the OS will context-switch (I believe that is the correct term) between threads.

time functions access a clock on your computer via the OS - that clock is always running. As long as the OS periodically gives each thread time to access a clock the thread's target can tell if it has been sleeping long enough.

The threads are not running in parallel, the OS periodically gives each one a chance to look at the clock .

Here is a little finer detail for what is happening. I subclassed Thread and overrode its run and join methods to log when they are called.

CaveatThe documentation specifically states

only override __init__ and run methods

I was surprised overriding join didn't cause problems.

from time import sleep, perf_counter
from threading import Thread
import pandas as pd
 
c = {}
def foo(i):
    c[i]['foo start'] = perf_counter() - start
    sleep(5)
    # print(f'{i} - start:{start} end:{perf_counter()}')
    c[i]['foo end'] = perf_counter() - start

class Test(Thread):
    def __init__(self,*args,**kwargs):
        self.i = kwargs['args'][0]
        super().__init__(*args,**kwargs)
    def run(self):
        # print(f'{self.i} - started:{perf_counter()}')
        c[self.i]['thread start'] = perf_counter() - start
        super().run()
    def join(self):
        # print(f'{self.i} - joined:{perf_counter()}')
        c[self.i]['thread joined'] = perf_counter() - start
        super().join()

threads = []
start = perf_counter()
for i in range(10):
    c[i] = {}
    t = Test(target=foo,args=(i,))
    t.start()
    threads.append(t)

for i in threads:
    i.join()

df = pd.DataFrame(c)
print(df)

                      0         1         2         3         4         5         6         7         8         9
thread start   0.000729  0.000928  0.001085  0.001245  0.001400  0.001568  0.001730  0.001885  0.002056  0.002215
foo start      0.000732  0.000931  0.001088  0.001248  0.001402  0.001570  0.001732  0.001891  0.002058  0.002217
thread joined  0.002228  5.008274  5.008300  5.008305  5.008323  5.008327  5.008330  5.008333  5.008336  5.008339
foo end        5.008124  5.007982  5.007615  5.007829  5.007672  5.007899  5.007724  5.007758  5.008051  5.007549

Hopefully you can see that all the threads are started in sequence very close together; once thread 0 is joined nothing else happens till it stops ( foo ends) then each of the other threads are joined and terminate.

Sometimes a thread terminates before it is even joined - for threads one plus foo ends before the thread is joined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM