简体   繁体   English

Python 信号量在 Google Colab 中似乎不起作用

[英]Python Semaphore does not seem to work in Google Colab

I am trying to follow this example我试图按照这个例子

limit number of threads working in parallel 限制并行工作的线程数

To limit the number of threads I am working with.限制我正在使用的线程数。

When I try this code当我尝试此代码时

import threading
import time

maxthreads = 5
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(str(i)))
    threads.append(thread)
    thread.start()

This is the output这是输出

start 0start 1
start 3start 2


start 4

the 2nd half of the output does not come.输出的第二半不来。 Perhaps this is something to do with colab?也许这与colab有关?

If so, is there a recommended way to limit the number of threads in colab multithreading?如果是这样,是否有推荐的方法来限制 colab 多线程中的线程数?

I also tried boundedsemaphore, same result我也试过有界信号量,结果相同

import threading
import time

maxthreads = 5
sema = threading.BoundedSemaphore(maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(str(i)))
    threads.append(thread)
    thread.start()

EDIT : I've now come back to this answer after some time to provide some more insight, having realized that my original answer was probably wrong.编辑:一段时间后,我现在回到这个答案以提供更多见解,因为我意识到我原来的答案可能是错误的。 I've included a description of the problem which I think is interesting by itself, but you can skip it and go straight to a possible solution.我已经包含了对问题的描述,我认为它本身很有趣,但是您可以跳过它并直接找到可能的解决方案。

The problem问题

I originally thought that the issue was that Google Colab was prematurely stopping the process/threads when they were inactive.我最初认为问题在于 Google Colab 在进程/线程处于非活动状态时过早地停止了它们。 While that seemed reasonable at the time, I've realized that the answer is much simpler.虽然这在当时看起来很合理,但我意识到答案要简单得多。

The issue here is that the main thread is not waiting for the created threads to end.这里的问题是主线程没有等待创建的线程结束。 After the main thread is done, Google Colab does not seem to wait for the other threads to end, and so the output they produce never reaches the main console.主线程完成后,Google Colab 似乎不会等待其他线程结束,因此它们产生的输出永远不会到达主控制台。 The following code runs as expected locally:以下代码在本地按预期运行:

import threading
import time

maxthreads = 2
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(i,))
    threads.append(thread)
    thread.start()

Saving it locally to a file and running it yields:将它本地保存到一个文件并运行它会产生:

start 0
start 1
start 2
start 3
start 4
start 5
start 6
start 7
start 8
start 9

However when running it in Google Colab (you can try it here ) we get:但是,当在 Google Colab 中运行它时(你可以在这里尝试),我们得到:

start 0
start 1

What's going on internally (I assume) is that the main thread is done, and then Google Colab doesn't wait for all the other threads to end.内部发生的事情(我假设)是主线程已完成,然后 Google Colab 不会等待所有其他线程结束。 We only see the first to threads' output because those run fast enough that they are done before the main thread ends.我们只看到第一个线程的输出,因为它们运行得足够快,以至于它们在主线程结束之前完成。 An interesting experiment is to print something when the main thread is done:一个有趣的实验是在主线程完成后打印一些东西:

import threading
import time

maxthreads = 2
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(i,))
    threads.append(thread)
    thread.start()

print('Main thread done')

We get the following output (output from running locally on the left, output from running on Google Colab on the right):我们得到以下输出(左侧本地运行的输出,右侧 Google Colab 运行的输出):

Locally:                      Google colab:
---------------------------------------
start 0              |        start 0
start 1              |        start 1
Main thread done     |        Main thread done
start 2              |
start 3              |
start 4              |
start 5              |
start 6              |
start 7              |
start 8              |
start 9              |

Indeed we see that once the main thread is done, the rest of the output is lost on Google Colab.我们确实看到,一旦主线程完成,其余的输出就会在 Google Colab 上丢失。

A solution一个办法

We can make use of Thread.join() ( docs ) to wait until a thread is done.我们可以使用Thread.join() ( docs ) 来等待线程完成。 That way, we can make the main process wait for all the additional threads before finishing (you can try it in Google Colab here ):这样,我们可以让主进程在完成之前等待所有附加线程(您可以在此处的Google Colab 中尝试):

import threading
import time

maxthreads = 2
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(i,))
    threads.append(thread)
    thread.start()

for t in threads:
    t.join()

And the output is the same, both locally and in Google Colab:本地和 Google Colab 中的输出是相同的:

start 0
start 1
start 2
start 3
start 4
start 5
start 6
start 7
start 8
start 9

You can also try adding print('Main thread done') at the end, and you'll see that it will be printed only when all the additional threads are done.您也可以尝试在最后添加print('Main thread done') ,您会看到只有在所有其他线程都完成后才会打印它。


On an unrelated note, you should probably change在一个不相关的注释上,您可能应该更改

thread = threading.Thread(target=task,args=(str(i)))

To

thread = threading.Thread(target=task,args=(i,))

Or you might get problems when i is a two-digit number.或者当i是两位数时,您可能会遇到问题。 Note that (i,) is a tuple with i as its single element.请注意, (i,)是一个以i作为其单个元素的元组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM