简体   繁体   English

Java Vector内部产品

[英]Java Vector inner products

I have an arraylist of m-dimensional vectors (stored as simple arrays) v1, v2 ... vn. 我有一个m维向量的数组列表(存储为简单数组) v1, v2 ... vn.

I have to repeatedly calculate inner products between two vectors that I select from among these vectors. 我必须重复计算从这些向量中选择的两个向量之间的内积。

One way to do this is a simple for loop across all it's components. 一种实现方法是在所有组件之间进行简单的for循环。

double sum=0;
for(int i=0; i<m; i++)
    sum+=v1[i]*v2[i];

This is pretty much the only linear algebra operation I intend on performing on my data. 这几乎是我打算对数据执行的唯一线性代数运算。

  1. Would it be more efficient to import a linalg library like JAMA or la4j, store everything as matrices, and calculate the inner products? 导入像JAMA或la4j这样的linalg库,将所有内容存储为矩阵并计算内部乘积会更有效吗? (these aren't even large matrix multiplications, just inner products between 1D vectors) (这些甚至不是大型矩阵乘法,而只是一维向量之间的内积)

  2. How does la4j(etc) implement a dot product? la4j(etc)如何实现点积? Would it not also iterate through each index and multiply each pair of components? 是否还会遍历每个索引并乘以每一对组件?

la4j is Open Source, have a look at the code . la4j是开放源代码,请看代码 Using the AbstractVector 's inner product is less efficient than your own code, since it instantiates additional operation objects, here the OoPlaceInnerProduct . 使用AbstractVector的内部乘积比您自己的代码效率低,因为它实例化了其他操作对象,在这里是OoPlaceInnerProduct

However : In most cases I would still prefer using an existing, well-tested vector math package than implementing my own one. 但是 :在大多数情况下,与实现自己的软件包相比,我还是更喜欢使用经过良好测试的现有矢量数学软件包。 (Knuth: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”.) (努斯:“我们应该忘记效率低下的问题,大约有97%的时间是这样:过早的优化是万恶之源”。)

Note that multithreading doesn't help either . 请注意, 多线程也无济于事 Even the highly efficient LAPACK package uses the same code as yours. 甚至高效的LAPACK软件包也使用相同的代码

Regaring the la4j libray: an upcomming version 0.5.0 uses lightweight sparse iteratos so you should not worry about iterating through zero values in sparse vectors. 关于la4j libray:即将推出的版本0.5.0使用轻量级的稀疏迭代器,因此您不必担心迭代稀疏向量中的零值。 Here is the API example 这是API示例

Vector a = new BasicVector(...); // dense
Vector b = new CompressedVector(...); // sparse
double dot = a.innerProduct(b);

You can combine any possible pair: sparse-dense, sparse-sparse, dense-dense, dense-sparse. 您可以组合任何可能的对:稀疏-密集,稀疏-稀疏,密集-密集,密集-稀疏。 The la4j library will always be using the most efficient algorithm depending on your data. 根据您的数据,la4j库将始终使用最高效的算法。

If you want to compute all inner products i reccomend this: in matrix A store your vectors in rows, and in B, store them in cols. 如果要计算所有内积,我建议这样做:在矩阵A中将向量存储在行中,在矩阵B中,将它们存储在cols中。 Then in A*B you will havat position (i,j) inner product of v_i and v_j. 然后在A * B中,您将获得v_i和v_j的内积(i,j)。

The trick is that normal matrix multiplication takes n^3 time, but some clever algorithm have more efficient methods, but only for ~ 30 and more vectors. 诀窍是普通矩阵乘法需要n ^ 3的时间,但是一些聪明的算法有更有效的方法,但是仅适用于〜30个或更多的向量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM