简体   繁体   English

numpy OpenBLAS设置了最大线程数

[英]numpy OpenBLAS set maximum number of threads

I am using numpy and my model involves intensive matrix-matrix multiplication. 我正在使用numpy,我的模型涉及密集矩阵矩阵乘法。 To speed up, I use OpenBLAS multi-threaded library to parallelize the numpy.dot function. 为了加快速度,我使用OpenBLAS多线程库来并行化numpy.dot函数。

My setting is as follows, 我的设置如下,

  • OS : CentOS 6.2 server #CPUs = 12, #MEM = 96GB 操作系统:CentOS 6.2服务器#CPU = 12,#MEM = 96GB
  • python version: Python2.7.6 python版本:Python2.7.6
  • numpy : numpy 1.8.0 numpy:numpy 1.8.0
  • OpenBLAS + IntelMKL OpenBLAS + IntelMKL

$ OMP_NUM_THREADS=8 python test_mul.py

code, of which I took from https://gist.github.com/osdf/ 代码,我从https://gist.github.com/osdf/获取

test_mul.py : test_mul.py

import numpy
import sys
import timeit

try:
    import numpy.core._dotblas
    print 'FAST BLAS'
except ImportError:
    print 'slow blas'

print "version:", numpy.__version__
print "maxint:", sys.maxint
print

x = numpy.random.random((1000,1000))

setup = "import numpy; x = numpy.random.random((1000,1000))"
count = 5

t = timeit.Timer("numpy.dot(x, x.T)", setup=setup)
print "dot:", t.timeit(count)/count, "sec"

when I use OMP_NUM_THREADS=1 python test_mul.py, the result is 当我使用OMP_NUM_THREADS = 1 python test_mul.py时,结果是

dot: 0.200172233582 sec

OMP_NUM_THREADS=2 OMP_NUM_THREADS = 2

dot: 0.103047609329 sec

OMP_NUM_THREADS=4 OMP_NUM_THREADS = 4

dot: 0.0533880233765 sec

things go well. 事情进展顺利。

However, when I set OMP_NUM_THREADS=8 .... the code starts to "occasionally works". 但是,当我设置OMP_NUM_THREADS=8 ....代码开始“偶尔工作”。

sometimes it works, sometimes it does not even run and and gives me core dumps. 有时它有效,有时甚至不运行,并给我核心转储。

when OMP_NUM_THREADS > 10 . OMP_NUM_THREADS > 10 the code seems to break all the time.. I am wondering what is happening here ? 代码似乎总是打破..我想知道这里发生了什么? Is there something like a MAXIMUM number threads that each process can use ? 是否有类似MAXIMUM数字线程的东西,每个进程可以使用? Can I raise that limit, given that I have 12 CPUs in my machine ? 鉴于我的机器中有12个CPU,我可以提高这个限制吗?

Thanks 谢谢

Firstly, I don't really understand what you mean by 'OpenBLAS + IntelMKL'. 首先,我真的不明白你的意思是'OpenBLAS + IntelMKL'。 Both of those are BLAS libraries, and numpy should only link to one of them at runtime. 这两个都是BLAS库,numpy应该只在运行时链接到其中一个。 You should probably check which of these two numpy is actually using. 您应该检查这两个numpy中的哪一个实际上正在使用。 You can do this by calling: 你可以通过调用:

$ ldd <path-to-site-packages>/numpy/core/_dotblas.so

Update: 更新: numpy/core/_dotblas.so was removed in numpy v1.10 , but you can check the linkage of numpy/core/multiarray.so instead. numpy/core/_dotblas.so已在numpy v1.10中删除 ,但您可以检查numpy/core/multiarray.so的链接。

For example, I link against OpenBLAS: 例如,我链接到OpenBLAS:

...
libopenblas.so.0 => /opt/OpenBLAS/lib/libopenblas.so.0 (0x00007f788c934000)
...

If you are indeed linking against OpenBLAS, did you build it from source? 如果您确实链接到OpenBLAS,是否从源代码构建它? If you did, you should see that in the Makefile.rule there is a commented option: 如果你这样做,你会发现在Makefile.rule有一个注释选项:

...
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 24
...

By default OpenBLAS will try to set the maximum number of threads to use automatically, but you could try uncommenting and editing this line yourself if it is not detecting this correctly. 默认情况下,OpenBLAS将尝试设置自动使用的最大线程数,但如果没有正确检测到这一点,您可以尝试自行取消注释和编辑此行。

Also, bear in mind that you will probably see diminishing returns in terms of performance from using more threads. 另外,请记住,使用更多线程时,您可能会看到性能方面的收益递减。 Unless your arrays are very large it is unlikely that using more than 6 threads will give much of a performance boost because of the increased overhead involved in thread creation and management. 除非你的数组非常大,否则使用6个以上的线程不太可能提高性能,因为线程创建和管理所涉及的开销增加了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM