[英]Is a relational database well suited for vector calculations?
The basic table schema looks something like this (I'm using MySQL BTW): 基本的表架构看起来像这样(我正在使用MySQL BTW):
integer unsigned
vector-id
integer unsigned fk-attribute-id
float attribute-value
primary key ( vector-id ,
fk-attribute-id )
The vector is represented as multiple records in the table with the same vector-id
向量在表中表示为具有相同向量ID的多个记录
I need to build a separate table with the dot product (also euclidean distance) of all vectors that exist in this table. 我需要用该表中存在的所有向量的点积(也就是欧氏距离)构建一个单独的表。 So, I need a result table that looks like this:
因此,我需要一个如下所示的结果表:
integer unsigned
fk-vector-id-a
integer unsigned fk-vector-id-b
float dot-product
...and one like this... ...这样的...
integer unsigned
fk-vector-id-a
integer unsigned fk-vector-id-b
float euclidean-distance
What is the best query structure to produce my result? 产生结果的最佳查询结构是什么?
With very large vectors, is a relational database the best approach to solve this problem, or should I internalize the vectors in an application and do the calculation there? 对于非常大的向量,关系数据库是解决此问题的最佳方法,还是我应该在应用程序中对向量进行内部化并在那里进行计算?
INSERT
INTO dot_products
SELECT v1.vector_id, v2.vector_id, SUM(v1.attribute_value * v2.attribute_value)
FROM attributes v1
JOIN attributes v2
ON v2.attribute_id = v1.attribute_id
GROUP BY
v1.vector_id, v2.vector_id
In MySQL
, this can be faster: 在
MySQL
,这可以更快:
INSERT
INTO dot_products
SELECT v1.vector_id, v2.vector_id,
(
SELECT SUM(va1.attribute_value * va2.attribute_value)
FROM attributes va1
JOIN attributes va2
ON va2.attribute_id = va1.attribute_id
WHERE va1.vector_id = v1.vector_id
AND va2.vector_id = v2.vector_id
)
FROM vector v1
CROSS JOIN
vector v2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.