简体   繁体   English

在大数据集中查找最接近的向量C#

[英]Finding the closest vector in a big dataset c#

I have a big (several million rows) dataset of vectors (ie List < double[] >) and I need to find the closest 1000 vectors to the given vector. 我有一个很大的向量数据集(几百万行)(即List <double []>),我需要找到与给定向量最接近的1000个向量。

The obvious solution is to calculate the distances for all of them and then sort the array, but I'm not sure if it is the right way to do considering the size of the output array. 显而易见的解决方案是计算所有距离,然后对数组进行排序,但是我不确定这是否是考虑输出数组大小的正确方法。

Maybe I should routinely remove the farthest vectors in the process of calculating the distances, thus it will be a small set of closest vectors all the time instead of a huge array in the end. 也许我应该在计算距离的过程中例行删除最远的向量,因此它将一直是一小组最近的向量,而不是最后的大数组。

On the other hand, it looks like I still can handle arrays this size without memory overflow error in 64 bit. 另一方面,看起来我仍然可以处理这种大小的数组,而不会出现64位内存溢出错误。

What will be the less costly way of solving this problem? 解决这个问题的成本更低的方法是什么?

If the right way is to have small set while calculating - what will be the right way of doing that? 如果正确的方法是在计算时设置较小的集合,那么正确的方法是什么?

如果将数据集存储在数据库中,则大多数现代DBMS支持地理编码和按距离搜索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM