Python 信号量在 Google Colab 中似乎不起作用

Question

我试图按照这个例子

限制我正在使用的线程数。

当我尝试此代码时

import threading
import time

maxthreads = 5
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(str(i)))
    threads.append(thread)
    thread.start()

这是输出

start 0start 1
start 3start 2


start 4

输出的第二半不来。 也许这与colab有关？

如果是这样，是否有推荐的方法来限制 colab 多线程中的线程数？

我也试过有界信号量，结果相同

import threading
import time

maxthreads = 5
sema = threading.BoundedSemaphore(maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(str(i)))
    threads.append(thread)
    thread.start()

Answer 1

编辑：一段时间后，我现在回到这个答案以提供更多见解，因为我意识到我原来的答案可能是错误的。 我已经包含了对问题的描述，我认为它本身很有趣，但是您可以跳过它并直接找到可能的解决方案。

问题

我最初认为问题在于 Google Colab 在进程/线程处于非活动状态时过早地停止了它们。 虽然这在当时看起来很合理，但我意识到答案要简单得多。

这里的问题是主线程没有等待创建的线程结束。 主线程完成后，Google Colab 似乎不会等待其他线程结束，因此它们产生的输出永远不会到达主控制台。 以下代码在本地按预期运行：

import threading
import time

maxthreads = 2
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(i,))
    threads.append(thread)
    thread.start()

将它本地保存到一个文件并运行它会产生：

start 0
start 1
start 2
start 3
start 4
start 5
start 6
start 7
start 8
start 9

但是，当在 Google Colab 中运行它时（你可以在这里尝试），我们得到：

start 0
start 1

内部发生的事情（我假设）是主线程已完成，然后 Google Colab 不会等待所有其他线程结束。 我们只看到第一个线程的输出，因为它们运行得足够快，以至于它们在主线程结束之前完成。 一个有趣的实验是在主线程完成后打印一些东西：

import threading
import time

maxthreads = 2
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(i,))
    threads.append(thread)
    thread.start()

print('Main thread done')

我们得到以下输出（左侧本地运行的输出，右侧 Google Colab 运行的输出）：

Locally:                      Google colab:
---------------------------------------
start 0              |        start 0
start 1              |        start 1
Main thread done     |        Main thread done
start 2              |
start 3              |
start 4              |
start 5              |
start 6              |
start 7              |
start 8              |
start 9              |

我们确实看到，一旦主线程完成，其余的输出就会在 Google Colab 上丢失。

一个办法

我们可以使用Thread.join() ( docs ) 来等待线程完成。 这样，我们可以让主进程在完成之前等待所有附加线程（您可以在此处的Google Colab 中尝试）：

import threading
import time

maxthreads = 2
sema = threading.Semaphore(value=maxthreads)
threads = list()

def task(i):
    sema.acquire()
    print( "start %s" % (i,))
    time.sleep(2)
    sema.release()

for i in range(10):
    thread = threading.Thread(target=task,args=(i,))
    threads.append(thread)
    thread.start()

for t in threads:
    t.join()

本地和 Google Colab 中的输出是相同的：

start 0
start 1
start 2
start 3
start 4
start 5
start 6
start 7
start 8
start 9

您也可以尝试在最后添加print('Main thread done') ，您会看到只有在所有其他线程都完成后才会打印它。

在一个不相关的注释上，您可能应该更改

thread = threading.Thread(target=task,args=(str(i)))

到

thread = threading.Thread(target=task,args=(i,))

或者当i是两位数时，您可能会遇到问题。 请注意， (i,)是一个以i作为其单个元素的元组。

Python 信号量在 Google Colab 中似乎不起作用

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-06-23 02:56:30

问题

一个办法

Python 信号量在 Google Colab 中似乎不起作用

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-06-23 02:56:30

问题

一个办法

解决方案1
3 已采纳 2019-06-23 02:56:30