简体   繁体   English

在 Python 中停止线程池中的进程

[英]Stopping processes in ThreadPool in Python

I've been trying to write an interactive wrapper (for use in ipython) for a library that controls some hardware.我一直在尝试为控制某些硬件的库编写一个交互式包装器(用于 ipython)。 Some calls are heavy on the IO so it makes sense to carry out the tasks in parallel.有些调用对 IO 来说很重,因此并行执行任务是有意义的。 Using a ThreadPool (almost) works nicely:使用 ThreadPool(几乎)效果很好:

from multiprocessing.pool import ThreadPool

class hardware():
    def __init__(IPaddress):
        connect_to_hardware(IPaddress)

    def some_long_task_to_hardware(wtime):
        wait(wtime)
        result = 'blah'
        return result

pool = ThreadPool(processes=4)
Threads=[]
h=[hardware(IP1),hardware(IP2),hardware(IP3),hardware(IP4)]
for tt in range(4):
    task=pool.apply_async(h[tt].some_long_task_to_hardware,(1000))
    threads.append(task)
alive = [True]*4
Try:
    while any(alive) :
        for tt in range(4): alive[tt] = not threads[tt].ready()
        do_other_stuff_for_a_bit()
except:
    #some command I cannot find that will stop the threads...
    raise
for tt in range(4): print(threads[tt].get())

The problem comes if the user wants to stop the process or there is an IO error in do_other_stuff_for_a_bit() .如果用户想要停止进程或do_other_stuff_for_a_bit()出现 IO 错误, do_other_stuff_for_a_bit() Pressing Ctrl + C stops the main process but the worker threads carry on running until their current task is complete.Ctrl + C停止主进程,但工作线程继续运行,直到它们的当前任务完成。
Is there some way to stop these threads without having to rewrite the library or have the user exit python?有什么方法可以停止这些线程而不必重写库或让用户退出 python? pool.terminate() and pool.join() that I have seen used in other examples do not seem to do the job.我在其他示例中看到的pool.terminate()pool.join()似乎没有完成这项工作。

The actual routine (instead of the simplified version above) uses logging and although all the worker threads are shut down at some point, I can see the processes that they started running carry on until complete (and being hardware I can see their effect by looking across the room).实际例程(而不是上面的简化版本)使用日志记录,尽管所有工作线程在某个时候都关闭了,但我可以看到它们开始运行的进程一直持续到完成(作为硬件,我可以通过查看它们的效果穿过房间)。

This is in python 2.7.这是在python 2.7中。

UPDATE:更新:

The solution seems to be to switch to using multiprocessing.Process instead of a thread pool.解决方案似乎是切换到使用 multiprocessing.Process 而不是线程池。 The test code I tried is to run foo_pulse:我尝试的测试代码是运行 foo_pulse:

class foo(object):
    def foo_pulse(self,nPulse,name): #just one method of *many*
        print('starting pulse for '+name)
        result=[]
        for ii in range(nPulse):
            print('on for '+name)
            time.sleep(2)
            print('off for '+name)
            time.sleep(2)
            result.append(ii)
        return result,name

If you try running this using ThreadPool then ctrl-C does not stop foo_pulse from running (even though it does kill the threads right away, the print statements keep on coming:如果您尝试使用 ThreadPool 运行它,那么 ctrl-C 不会阻止 foo_pulse 运行(即使它确实立即终止了线程,打印语句仍会继续出现:

from multiprocessing.pool import ThreadPool
import time
def test(nPulse):
    a=foo()
    pool=ThreadPool(processes=4)
    threads=[]
    for rn in range(4) :
        r=pool.apply_async(a.foo_pulse,(nPulse,'loop '+str(rn)))
        threads.append(r)
    alive=[True]*4
    try:
        while any(alive) : #wait until all threads complete
            for rn in range(4):
                alive[rn] = not threads[rn].ready() 
                time.sleep(1)
    except : #stop threads if user presses ctrl-c
        print('trying to stop threads')
        pool.terminate()
        print('stopped threads') # this line prints but output from foo_pulse carried on.
        raise
    else : 
        for t in threads : print(t.get())

However a version using multiprocessing.Process works as expected:然而,使用 multiprocessing.Process 的版本按预期工作:

import multiprocessing as mp
import time
def test_pro(nPulse):
    pros=[]
    ans=[]
    a=foo()
    for rn in range(4) :
        q=mp.Queue()
        ans.append(q)
        r=mp.Process(target=wrapper,args=(a,"foo_pulse",q),kwargs={'args':(nPulse,'loop '+str(rn))})
        r.start()
        pros.append(r)
    try:
        for p in pros : p.join()
        print('all done')
    except : #stop threads if user stops findRes
        print('trying to stop threads')
        for p in pros : p.terminate()
        print('stopped threads')
    else : 
        print('output here')
        for q in ans :
            print(q.get())
    print('exit time')

Where I have defined a wrapper for the library foo (so that it did not need to be re-written).我已经为库 foo 定义了一个包装器(这样它就不需要重新编写了)。 If the return value is not needed the neither is this wrapper :如果不需要返回值,则此包装器也不需要:

def wrapper(a,target,q,args=(),kwargs={}):
    '''Used when return value is wanted'''
    q.put(getattr(a,target)(*args,**kwargs))

From the documentation I see no reason why a pool would not work (other than a bug).从文档中,我没有看到池不起作用的原因(除了错误)。

This is a very interesting use of parallelism.这是并行性的一个非常有趣的用途。

However, if you are using multiprocessing , the goal is to have many processes running in parallel, as opposed to one process running many threads.但是,如果您使用multiprocessing ,则目标是让多个进程并行运行,而不是一个进程运行多个线程。

Consider these few changes to implement it using multiprocessing :考虑以下几个更改以使用multiprocessing实现它:

You have these functions that will run in parallel:您有这些将并行运行的函数:

import time
import multiprocessing as mp


def some_long_task_from_library(wtime):
    time.sleep(wtime)


class MyException(Exception): pass

def do_other_stuff_for_a_bit():
    time.sleep(5)
    raise MyException("Something Happened...")

Let's create and start the processes, say 4:让我们创建并启动进程,比如 4:

procs = []  # this is not a Pool, it is just a way to handle the
            # processes instead of calling them p1, p2, p3, p4...
for _ in range(4):
    p = mp.Process(target=some_long_task_from_library, args=(1000,))
    p.start()
    procs.append(p)
mp.active_children()   # this joins all the started processes, and runs them.

The processes are running in parallel, presumably in a separate cpu core, but that is to the OS to decide.这些进程并行运行,大概在一个单独的 cpu 内核中,但这是由操作系统决定的。 You can check in your system monitor.您可以检查您的系统监视器。

In the meantime you run a process that will break, and you want to stop the running processes, not leaving them orphan:与此同时,您运行了一个会中断的进程,并且您想停止正在运行的进程,而不是让它们成为孤儿:

try:
    do_other_stuff_for_a_bit()
except MyException as exc:
    print(exc)
    print("Now stopping all processes...")
    for p in procs:
        p.terminate()
print("The rest of the process will continue")

If it doesn't make sense to continue with the main process when one or all of the subprocesses have terminated, you should handle the exit of the main program.如果在一个或所有子进程终止后继续执行主进程没有意义,您应该处理主程序的退出。

Hope it helps, and you can adapt bits of this for your library.希望它有所帮助,您可以为您的图书馆调整一些内容。

In answer to the question of why pool did not work then this is due to (as quoted in the Documentation ) then main needs to be importable by the child processes and due to the nature of this project interactive python is being used.为了回答为什么池不起作用的问题,这是由于(如文档中所引用的)然后main需要由子进程导入,并且由于该项目的性质,交互式 python 正在被使用。

At the same time it was not clear why ThreadPool would - although the clue is right there in the name.同时也不清楚为什么 ThreadPool 会 - 尽管线索就在名称中。 ThreadPool creates its pool of worker processes using multiprocessing.dummy which as noted here is just a wrapper around the Threading module. ThreadPool 使用 multiprocessing.dummy 创建其工作进程池,正如此处所述,它只是 Threading 模块的包装器。 Pool uses the multiprocessing.Process.池使用 multiprocessing.Process。 This can be seen by this test:通过这个测试可以看出:

p=ThreadPool(processes=3)
p._pool[0]
<DummyProcess(Thread23, started daemon 12345)> #no terminate() method

p=Pool(processes=3)
p._pool[0]
<Process(PoolWorker-1, started daemon)> #has handy terminate() method if needed

As threads do not have a terminate method the worker threads carry on running until they have completed their current task.由于线程没有终止方法,工作线程会继续运行,直到完成当前任务。 Killing threads is messy (which is why I tried to use the multiprocessing module) but solutions are here .杀死线程很麻烦(这就是我尝试使用多处理模块的原因),但解决方案在这里

The one warning about the solution using the above:关于使用上述解决方案的一个警告:

def wrapper(a,target,q,args=(),kwargs={}):
    '''Used when return value is wanted'''
    q.put(getattr(a,target)(*args,**kwargs))

is that changes to attributes inside the instance of the object are not passed back up to the main program.是对对象实例内属性的更改不会传递回主程序。 As an example the class foo above can also have methods such as: def addIP(newIP): self.hardwareIP=newIP A call to r=mp.Process(target=a.addIP,args=(127.0.0.1)) does not update a .例如,上面的 foo 类也可以有这样的方法: def addIP(newIP): self.hardwareIP=newIP A call to r=mp.Process(target=a.addIP,args=(127.0.0.1))没有更新a .

The only way round this for a complex object seems to be shared memory using a custom manager which can give access to both the methods and attributes of object a For a very large complex object based on a library this may be best done using dir(foo) to populate the manager.解决复杂对象的唯一方法似乎是使用自定义manager共享内存,该manager可以访问对象a的方法和属性对于基于库的非常大的复杂对象,最好使用dir(foo)来填充管理器。 If I can figure out how I'll update this answer with an example (for my future self as much as others).如果我能弄清楚我将如何用一个例子更新这个答案(对于我未来的自己和其他人一样)。

If for some reasons using threads is preferable, we can use this .如果出于某些原因使用线程更可取,我们可以使用this

We can send some siginal to the threads we want to terminate.我们可以向我们想要终止的线程发送一些信号。 The simplest siginal is global variable:最简单的信号是全局变量:

import time
from multiprocessing.pool import ThreadPool

_FINISH = False

def hang():
    while True:
        if _FINISH:
            break
        print 'hanging..'
        time.sleep(10)


def main():
    global _FINISH
    pool = ThreadPool(processes=1)
    pool.apply_async(hang)
    time.sleep(10)
    _FINISH = True
    pool.terminate()
    pool.join()
    print 'main process exiting..'


if __name__ == '__main__':
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM