使用多处理模块运行并行进程，其中一个进程由另一个进程供（依赖）维特比算法

Question

I have recently played around with Python's multiprocessing module to speed up the forward-backward algorithm for Hidden Markov Models as forward filtering and backward filtering can run independently. 我最近使用了Python的多处理模块，以加快Hidden Markov模型的前向后退算法，因为前向过滤和后向过滤可以独立运行。 Seeing the run-time halve was awe-inspiring stuff. 看到一半的运行时间是令人敬畏的。

I now attempt to include some multiprocessing in my iterative Viterbi algorithm.In this algorithm, the two processes I am trying to run are not independent. 我现在尝试在我的迭代维特比算法中包含一些多处理程序。在该算法中，我尝试运行的两个进程不是独立的。 The val_max part can run independently but arg_max[t] depends on val_max[t-1]. val_max部分可以独立运行，但arg_max [t]取决于val_max [t-1]。 So I played with the idea that one can run val_max as a separate process and then arg_max also as a separate process which can be fed by val_max. 因此，我想到了可以将val_max作为单独的进程运行，然后将arg_max也作为可以由val_max提供的单独进程运行的想法。

I admit to be a bit out of my depth here and do not know much about multiprocessing other than watching some basic video's on it as well as browsing blogs. 我承认我在这里有点儿不了解，对多处理的了解不多，除了观看一些基本视频并浏览博客外。 I provide my attempt below, but it does not work. 我在下面提供了我的尝试，但是没有用。


import numpy as np
from time import time,sleep
import multiprocessing as mp

class Viterbi:


    def __init__(self,A,B,pi):
        self.M = A.shape[0] # number of hidden states
        self.A = A  # Transition Matrix
        self.B = B   # Observation Matrix
        self.pi = pi   # Initial distribution
        self.T = None   # time horizon
        self.val_max = None
        self.arg_max = None
        self.obs = None
        self.sleep_time = 1e-6
        self.output = mp.Queue()


    def get_path(self,x):
        # returns the most likely state sequence given observed sequence x
        # using the Viterbi algorithm
        self.T = len(x)
        self.val_max = np.zeros((self.T, self.M))
        self.arg_max = np.zeros((self.T, self.M))
        self.val_max[0] = self.pi*self.B[:,x[0]]
        for t in range(1, self.T):
            # Indepedent Process
            self.val_max[t] = np.max( self.A*np.outer(self.val_max[t-1],self.B[:,obs[t]]) , axis = 0  ) 
            # Dependent Process
            self.arg_max[t] = np.argmax( self.val_max[t-1]*self.A.T, axis = 1)

        # BACKTRACK
        states = np.zeros(self.T, dtype=np.int32)
        states[self.T-1] = np.argmax(self.val_max[self.T-1])
        for t in range(self.T-2, -1, -1):
            states[t] = self.arg_max[t+1, states[t+1]]
        return states

    def get_val(self):
        '''Independent Process'''
        for t in range(1,self.T):
            self.val_max[t] = np.max( self.A*np.outer(self.val_max[t-1],self.B[:,self.obs[t]]) , axis = 0  ) 
        self.output.put(self.val_max)

    def get_arg(self):
        '''Dependent Process'''
        for t in range(1,self.T):
            while 1:
                # Process info if available
                if self.val_max[t-1].any() != 0:
                    self.arg_max[t] = np.argmax( self.val_max[t-1]*self.A.T, axis = 1)
                    break
                # Else sleep and wait for info to arrive
                sleep(self.sleep_time)
        self.output.put(self.arg_max)

    def get_path_parallel(self,x):
        self.obs = x
        self.T = len(obs)
        self.val_max = np.zeros((self.T, self.M))
        self.arg_max = np.zeros((self.T, self.M))
        val_process = mp.Process(target=self.get_val)
        arg_process = mp.Process(target=self.get_arg)  
        # get first initial value for val_max which can feed arg_process
        self.val_max[0] = self.pi*self.B[:,obs[0]]
        arg_process.start()
        val_process.start()
        arg_process.join()
        val_process.join()

Note: get_path_parallel does not have backtracking yet. 注意：get_path_parallel还没有回溯。

It would seem that val_process and arg_process never really run. 似乎val_process和arg_process从未真正运行过。 Really not sure why this happens. 真的不确定为什么会这样。 You can run the code on the Wikipedia example for the viterbi algorithm. 您可以在Wikipedia示例上运行viterbi算法的代码。

obs = np.array([0,1,2])  # normal then cold and finally dizzy  

pi = np.array([0.6,0.4])

A = np.array([[0.7,0.3],
             [0.4,0.6]])

B = np.array([[0.5,0.4,0.1],
             [0.1,0.3,0.6]]) 

viterbi = Viterbi(A,B,pi)
path = viterbi.get_path(obs)

I also tried using Ray. 我也尝试使用Ray。 However, I had no clue what I was really doing there. 但是，我不知道自己在那儿实际上在做什么。 Can you please help recommend me what to do in order to get the parallel version to run. 您能否帮我推荐我该怎么做才能使并行版本运行。 I must be doing something very wrong but I do not know what. 我一定做错了什么，但我不知道怎么做。

Your help would be much appreciated. 您的帮助将不胜感激。

Answer 1

I have managed to get my code working thanks to @SıddıkAçıl. 感谢@SıddıkAçıl，我设法使我的代码正常工作。 The producer-consumer pattern is what does the trick. 生产者-消费者模式是解决问题的关键。 I also realised that the processes can run successfully but if one does not store the final results in a "result queue" of sorts then it vanishes. 我还意识到，这些过程可以成功运行，但是如果没有将最终结果存储在某种“结果队列”中，则该过程将消失。 By this I mean, that I filled in values in my numpy arrays val_max and arg_max by allowing the process to start() but when I called them, they were still np.zero arrays. 我的意思是，我通过允许进程start（）在numpy数组val_max和arg_max中填充值，但是当我调用它们时，它们仍然是np.zero数组。 I verified that they did fill up to the correct arrays by printing them just as the process is about to terminate (at last self.T in iteration). 我验证了它们确实可以通过打印它们来填充正确的数组，就像进程将要终止时一样（最后是self.T在迭代中）。 So instead of printing them, I just added them to a multiprocessing Queue object on the final iteration to capture then entire filled up array. 因此，我没有打印它们，而是在最后一次迭代中将它们添加到多处理Queue对象中，以捕获整个填充数组。

I provide my updated working code below. 我在下面提供了更新的工作代码。 NOTE: it is working but takes twice as long to complete as the serial version. 注意：它正在工作，但完成所需的时间是串行版本的两倍。 My thoughts on why this might be so is as follows: 我对为什么会这样的想法如下：

I can get it to run as two processes but don't actually know how to do it properly. 我可以让它作为两个进程运行，但实际上不知道如何正确执行。 Experienced programmers might know how to fix it with the chunksize parameter. 有经验的程序员可能知道如何使用chunksize参数对其进行修复。
The two processes I am separating are numpy matrix operations. 我分离的两个过程是numpy矩阵运算。 These processes execute so fast already that the overhead of concurrency (multiprocessing) is not worth the theoretical improvement. 这些过程执行得如此之快，以至于并发（多处理）的开销不值得进行理论上的改进。 Had the two processes been the two original for loops (as used in Wikipedia and most implementations) then multiprocessing might have given gains (perhaps I should investigate this). 如果这两个过程是两个原始的for循环（在Wikipedia和大多数实现中使用的），那么多处理可能会有所收获（也许我应该对此进行研究）。 Furthermore, because we have a producer-consumer pattern and not two independent processes (producer-producer pattern) we can only expect the producer-consumer pattern to run as long as the longest of the two processes (in this case the producer takes twice as long as the consumer). 此外，由于我们具有生产者-消费者模式，而不是两个独立的过程（生产者-生产者模式），因此我们只能期望生产者-消费者模式的运行时间与两个过程中最长的一样长（在这种情况下，生产者将花费两倍的时间）。只要消费者）。 We can not expect run time to halve as in the producer-producer scenario (this happened with my parallel forward-backward HMM filtering algorithm). 我们不能期望运行时间像生产者-生产者方案那样减少一半（这是在我的并行前向后HMM过滤算法中发生的）。
My computer has 4 cores and numpy already does built-in CPU multiprocessing optimization on its operations. 我的计算机有4个核心，并且numpy已经对其操作进行了内置CPU多处理优化。 By me attempting to use cores to make the code faster, I am depriving numpy of cores that it could use in a more effective manner. 通过我尝试使用内核来使代码更快，我剥夺了numpy可以更有效使用的内核。 To figure this out, I am going to time the numpy operations and see if they are slower in my concurrent version as compared to that of my serial version. 为了弄清楚这一点，我将对numpy操作进行计时，以查看它们在并发版本中是否比串行版本慢。

I will update if I learn anything new. 如果我学到新东西，我会更新。 If you perhaps know the real reason for why my concurrent code is so much slower, please do let me know. 如果您可能知道导致我的并发代码这么慢的真正原因，请告诉我。 Here is the code: 这是代码：


import numpy as np
from time import time
import multiprocessing as mp

class Viterbi:


    def __init__(self,A,B,pi):
        self.M = A.shape[0] # number of hidden states
        self.A = A  # Transition Matrix
        self.B = B   # Observation Matrix
        self.pi = pi   # Initial distribution
        self.T = None   # time horizon
        self.val_max = None
        self.arg_max = None
        self.obs = None
        self.intermediate = mp.Queue()
        self.result = mp.Queue()



    def get_path(self,x):
        '''Sequential/Serial Viterbi Algorithm with backtracking'''
        self.T = len(x)
        self.val_max = np.zeros((self.T, self.M))
        self.arg_max = np.zeros((self.T, self.M))
        self.val_max[0] = self.pi*self.B[:,x[0]]
        for t in range(1, self.T):
            # Indepedent Process
            self.val_max[t] = np.max( self.A*np.outer(self.val_max[t-1],self.B[:,obs[t]]) , axis = 0  ) 
            # Dependent Process
            self.arg_max[t] = np.argmax( self.val_max[t-1]*self.A.T, axis = 1)

        # BACKTRACK
        states = np.zeros(self.T, dtype=np.int32)
        states[self.T-1] = np.argmax(self.val_max[self.T-1])
        for t in range(self.T-2, -1, -1):
            states[t] = self.arg_max[t+1, states[t+1]]
        return states

    def get_val(self,intial_val_max):
        '''Independent Poducer Process'''
        val_max = intial_val_max
        for t in range(1,self.T):
            val_max = np.max( self.A*np.outer(val_max,self.B[:,self.obs[t]]) , axis = 0  )
            #print('Transfer: ',self.val_max[t])
            self.intermediate.put(val_max)
            if t == self.T-1:
                self.result.put(val_max)   # we only need the last val_max value for backtracking




    def get_arg(self):
        '''Dependent Consumer Process.'''
        t = 1
        while t < self.T:
            val_max =self.intermediate.get()
            #print('Receive: ',val_max)
            self.arg_max[t] = np.argmax( val_max*self.A.T, axis = 1)
            if t == self.T-1:
                self.result.put(self.arg_max)
            #print('Processed: ',self.arg_max[t])
            t += 1

    def get_path_parallel(self,x):
        '''Multiprocessing producer-consumer implementation of Viterbi algorithm.'''
        self.obs = x
        self.T = len(obs)
        self.arg_max = np.zeros((self.T, self.M))  # we don't tabulate val_max anymore
        initial_val_max = self.pi*self.B[:,obs[0]]
        producer_process = mp.Process(target=self.get_val,args=(initial_val_max,),daemon=True)
        consumer_process = mp.Process(target=self.get_arg,daemon=True) 
        self.intermediate.put(initial_val_max)  # initial production put into pipeline for consumption
        consumer_process.start()  # we can already consume initial_val_max
        producer_process.start()
        #val_process.join()
        #arg_process.join()
        #self.output.join()
        return self.backtrack(self.result.get(),self.result.get()) # backtrack takes last row of val_max and entire arg_max

    def backtrack(self,val_max_last_row,arg_max):
        '''Backtracking the Dynamic Programming solution (actually a Trellis diagram)
           produced by Multiprocessing Viterbi algorithm.'''
        states = np.zeros(self.T, dtype=np.int32)
        states[self.T-1] = np.argmax(val_max_last_row)
        for t in range(self.T-2, -1, -1):
            states[t] = arg_max[t+1, states[t+1]]
        return states



if __name__ == '__main__':

    obs = np.array([0,1,2])  # normal then cold and finally dizzy  

    T = 100000
    obs = np.random.binomial(2,0.3,T)        

    pi = np.array([0.6,0.4])

    A = np.array([[0.7,0.3],
                 [0.4,0.6]])

    B = np.array([[0.5,0.4,0.1],
                 [0.1,0.3,0.6]]) 

    t1 = time()
    viterbi = Viterbi(A,B,pi)
    path = viterbi.get_path(obs)
    t2 = time()
    print('Iterative Viterbi')
    print('Path: ',path)
    print('Run-time: ',round(t2-t1,6)) 
    t1 = time()
    viterbi = Viterbi(A,B,pi)
    path = viterbi.get_path_parallel(obs)
    t2 = time()
    print('\nParallel Viterbi')
    print('Path: ',path)
    print('Run-time: ',round(t2-t1,6))

Answer 2

Welcome to SO. 欢迎来到SO。 Consider taking a look at producer-consumer pattern that is heavily used in multiprocessing. 考虑看一下在多处理中大量使用的生产者-消费者模式。

Beware that multiprocessing in Python reinstantiates your code for every process you create on Windows . 请注意，Python中的多处理会为您在Windows上创建的每个进程重新实例化代码。 So your Viterbi objects and therefore their Queue fields are not the same. 因此，您的Viterbi对象及其队列字段不同。

Observe this behaviour through: 通过以下方式观察此行为：

import os

def get_arg(self):
    '''Dependent Process'''
    print("Dependent ", self)
    print("Dependent ", self.output)
    print("Dependent ", os.getpid())

def get_val(self):
    '''Independent Process'''
    print("Independent ", self)
    print("Independent ", self.output)
    print("Independent ", os.getpid())

if __name__ == "__main__":
    print("Hello from main process", os.getpid())
    obs = np.array([0,1,2])  # normal then cold and finally dizzy  

    pi = np.array([0.6,0.4])

    A = np.array([[0.7,0.3],
             [0.4,0.6]])

    B = np.array([[0.5,0.4,0.1],
             [0.1,0.3,0.6]]) 

    viterbi = Viterbi(A,B,pi)
    print("Main viterbi object", viterbi)
    print("Main viterbi object queue", viterbi.output)
    path = viterbi.get_path_parallel(obs)

There are three different Viterbi objects as there are three different processes. 由于存在三个不同的过程，因此存在三个不同的Viterbi对象。 So, what you need in terms of parallelism is not processes. 因此，就并行性而言，您需要的不是流程。 You should explore the threading library that Python offers. 您应该探索Python提供的threading库。

使用多处理模块运行并行进程，其中一个进程由另一个进程供（依赖）维特比算法

问题描述

2 个解决方案

解决方案1
1 2019-06-24 14:06:30

解决方案2
0 已采纳 2019-06-23 15:57:01

使用多处理模块运行并行进程，其中一个进程由另一个进程供（依赖）维特比算法

问题描述

2 个解决方案

解决方案1 1 2019-06-24 14:06:30

解决方案2 0 已采纳 2019-06-23 15:57:01

解决方案1
1 2019-06-24 14:06:30

解决方案2
0 已采纳 2019-06-23 15:57:01