使用cython并行化

Question

Is there a way the code below can be parallelized? 有什么办法可以并行化下面的代码？ I looked into cyton's prange, but couldn't figure out how it works. 我调查了cyton的阴谋，但不知道它是如何工作的。 Does the prange parallelize the internal loops on different cores? prange是否使不同内核上的内部循环并行化？ For the code below how can I parallelize it? 对于下面的代码，我如何使其并行化？

@cython.boundscheck(False)
def gs_iterate_once(double[:,:] doc_topic,
                    double[:,:] topic_word,
                    double[:] topic_distribution,
                    double[:] topic_probabilities,
                    unsigned int[:,:] doc_word_topic,
                    int num_topics):
  cdef unsigned int doc_id
  cdef unsigned int word_id
  cdef unsigned int topic_id
  cdef unsigned int new_topic
  for i in xrange(doc_word_topic.shape[0]):
    doc_id = doc_word_topic[i, 0]
    word_id = doc_word_topic[i, 1]
    topic_id = doc_word_topic[i, 2]

    doc_topic[doc_id, topic_id] -= 1
    topic_word[topic_id, word_id] -= 1
    topic_distribution[topic_id] -= 1

    for j in xrange(num_topics):
      topic_probabilities[j] = (doc_topic[doc_id, j] * topic_word[j, word_id]) / topic_distribution[j]

    new_topic = draw_topic(np.asarray(topic_probabilities))

    doc_topic[doc_id, new_topic] += 1
    topic_word[new_topic, word_id] += 1
    topic_distribution[new_topic] += 1
    # Set the new topic
    doc_word_topic[i, 2] = new_topic

Answer 1

prange uses OpenMP that is indeed shared-memory parallelism . prange使用的OpenMP确实是共享内存并行性。 So, on a single computer it will create threads that run on the different cores available, with access to the same pool of memory. 因此，在一台计算机上，它将创建在可用的不同内核上运行的线程，并可以访问相同的内存池。

For the routine that you show, the first step is to understand what part can be parallelized. 对于您显示的例程，第一步是了解可以并行化哪些部分。 Typically, with data using as first index i , operating only on element i and not, say, i-1 or i+1 , makes the problem parallelizable. 通常，将数据用作第一索引i ，仅对元素i ，而不对i-1或i+1 ，使问题可并行化。 This is not the case here, so you need to find a way to make the computation more independent. 这里不是这种情况，因此您需要找到一种使计算更加独立的方法。

Actually finding the specific parallel pattern is beyond a SO answer but I'll mention a few tips: 实际上，找到特定的并行模式超出了SO的答案，但是我会提到一些技巧：

What is inside the prange must be all cythonized. 在prange 里面的东西必须全部被cythonized。 Python calls are not possible in a thread. 线程中无法进行Python调用。 + suggestion by @DavidW: Python calls are possible when part of a with gil block. + @DavidW的建议：当with gil块的一部分时，可以进行Python调用。
A typical advice here is to check, once your code has been made independent of the loop ordering, wheter your results are the same when running the index from n-1 to 0 instead of from 0 to n-1 这里的一个典型建议是检查使代码独立于循环顺序之后，在从n-1到0而不是从0到n-1的索引上运行时，结果是否相同
A few commented and illustrative examples: https://homes.cs.washington.edu/~jmschr/lectures/Parallel_Processing_in_Python.html Cython prange slower for 4 threads then with range http://nealhughes.net/parallelcomp2/ http://www.perrygeo.com/parallelizing-numpy-array-loops-with-cython-and-mpi.html 一些评论和说明性示例： https: //homes.cs.washington.edu/~jmschr/lectures/Parallel_Processing_in_Python.html Cython prange慢了4个线程，然后变化了 http://nealhughes.net/parallelcomp2/ http：// www.perrygeo.com/parallelizing-numpy-array-loops-with-cython-and-mpi.html

Answer 2

@PierredeBuyl's answer gives a good outline of what prange does and how to use it. @PierredeBuyl的答案很好地概述了prange功能以及如何使用它。

This is more a few specific comments relating to your code: 这是与您的代码有关的一些特定注释：

You can't parallelize the outer loop: 您不能并行化外循环：
```
 doc_topic[doc_id, topic_id] -= 1 
```
and the similar ones for other variables and for +=1 . 以及其他变量和+=1的相似值。 These modify a variable that is shared between all the loops, and are going to cause inconsistent results. 它们修改了所有循环之间共享的变量，并且将导致不一致的结果。
A similar problem exists with topic_probabilities[j] = ... if you're parallelizing the outer loop. 如果要并行化外部循环，则topic_probabilities[j] = ...存在类似问题。
You could easily parallelize the inner loop for j in xrange(num_topics): - this only modifies stuff that depends on the index 'j' so there's no issue with the threads fighting to modify the same data. 您可以轻松地for j in xrange(num_topics):并行化for j in xrange(num_topics):的内部循环for j in xrange(num_topics): -这只会修改依赖于索引'j'的内容，因此与线程争用修改相同数据没有问题。 (However, there's a performance cost each time you start a multithreaded region, so you usually try to parallelize the outer loop instead to avoid this - depending on the size of the arrays you may not gain much) （但是，每次启动多线程区域都会有性能损失，因此通常尝试并行化外循环来避免这种情况-根据数组的大小，您可能不会获得多少收益）

使用cython并行化

问题描述

2 个解决方案

解决方案1
3 2017-05-05 08:23:38

解决方案2
3 2017-05-05 09:44:18

使用cython并行化

问题描述

2 个解决方案

解决方案1 3 2017-05-05 08:23:38

解决方案2 3 2017-05-05 09:44:18

解决方案1
3 2017-05-05 08:23:38

解决方案2
3 2017-05-05 09:44:18