[英]Java best practices for vectorized computations
I'm researching methods for computing expensive vector operations in Java, eg dot-products or multiplications between large matrices. 我正在研究用Java计算昂贵的向量运算的方法,例如点积或大矩阵之间的乘法。 There are a few good threads on here on this topic, like this and this .
关于这个主题,这里有一些好的主题,比如这个和这个 。
It appears that there is no reliable way of having the JIT compile code to use CPU vector instructions (SSE2, AVX, MMX...). 似乎没有可靠的方法让JIT编译代码使用CPU向量指令(SSE2,AVX,MMX ......)。 Moreover, high-performance linear algebra libraries (ND4J, jblas, ...) do in fact make JNI calls to BLAS/LAPACK libraries for the core routines.
此外,高性能线性代数库(ND4J,jblas,...)实际上对核心例程进行了对BLAS / LAPACK库的JNI调用。 And I understand BLAS/LAPACK packages to be the de facto standard choices for native linear algebra computations.
我理解BLAS / LAPACK包是本机线性代数计算的事实上的标准选择。
On the other hand others (JAMA, ...) implement algorithms in pure Java without native
calls. 另一方面,其他人(JAMA,...)在没有
native
调用的情况下在纯Java中实现算法。
My questions are: 我的问题是:
native
calls to BLAS/LAPACK actually a recommended choice? native
调用? Are there other libraries worth considering?
I hope this question could be of help both for those who develop their own computation routines, and for those who just want to make an educated choice between different implementations. 我希望这个问题既可以帮助那些开发自己的计算程序的人,也可以帮助那些只想在不同实现之间做出明智选择的人。
Insights are appreciated! 深刻见解!
There are no clear best practices for every case. 每个案例都没有明确的最佳做法。 Whether you could/should use a pure Java solution (not using SIMD instructions) or (optimized with SIMD) native code through JNI depends on your particular application and specifically the size of your arrays and possible restrictions on the target system.
是否可以/应该使用纯Java解决方案(不使用SIMD指令)或(使用SIMD优化)本机代码通过JNI取决于您的特定应用程序,特别是阵列的大小和对目标系统的可能限制。
Pertinent benchmarks have been performed (in random order): 已执行相关基准测试(按随机顺序):
These benchmarks can be confusing as they are informative. 这些基准可能令人困惑,因为它们提供了丰富的信息。 One library may be faster for some operation and slower for some other.
对于某些操作,一个库可能更快,而对于其他操作则更慢。 Also keep in mind that there may be more than one implementation of BLAS available for your system.
另请注意,您的系统可能有多个BLAS实现可用。 I currently have 3 installed on my system blas, atlas and openblas.
我目前在我的系统blas,atlas和openblas上安装了3个。 Apart from choosing a Java library wrapping a BLAS implementation you also have to choose the underlying BLAS implementation.
除了选择包装BLAS实现的Java库之外,还必须选择基础BLAS实现。
This answer has a fairly up to date list except it doesn't mention nd4j that is rather new. 这个答案有一个相当新的列表,除了它没有提到相当新的nd4j。 Keep in mind that jeigen depends on eigen so not on BLAS.
请记住,jeigen取决于本征,因此不取决于BLAS。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.