简体繁体 English

有效的最近邻搜索稀疏矩阵

[英]Efficient nearest neighbour search for sparse matrices

原文 2013-08-10 17:07:15 6 2 python/ scipy/ scikit-learn/ nearest-neighbor

I have a large corpus of data (text) that I have converted to a sparse term-document matrix (I am using scipy.sparse.csr.csr_matrix to store sparse matrix). 我有scipy.sparse.csr.csr_matrix数据（文本），我已经转换为稀疏的术语 - 文档矩阵（我使用scipy.sparse.csr.csr_matrix来存储稀疏矩阵）。 I want to find, for every document, top n nearest neighbour matches. 我想找到，对于每个文件，前n个最近邻居匹配。 I was hoping that NearestNeighbor routine in Python scikit-learn library ( sklearn.neighbors.NearestNeighbor to be precise) would solve my problem, but efficient algorithms that use space partitioning data structures such as KD trees or Ball trees do not work with sparse matrices. 我希望Python scikit-learn库中的NearestNeighbor例程（ sklearn.neighbors.NearestNeighbor准确）可以解决我的问题，但是使用空间分区数据结构（如KD trees或Ball trees高效算法不适用于稀疏矩阵。 Only brute-force algorithm works with sparse matrices (which is infeasible in my case as I am dealing with large corpus). 只有蛮力算法适用于稀疏矩阵（在我处理大型语料库时，这种情况不可行）。

Is there any efficient implementation of nearest neighbour search for sparse matrices (in Python or in any other language)? 稀疏矩阵的最近邻搜索是否有效（Python或任何其他语言）？

Thanks. 谢谢。

2 个解决方案

Late answer: Have a look at Locality-Sensitive-Hashing 迟到的答案：看看Locality-Sensitive-Hashing

Support in scikit-learn has been proposed here and here . 这里和这里已经提出了对scikit-learn的支持。

您可以尝试使用TruncatedSVD将高维稀疏数据转换为低维密集数据，然后执行球树。

最近邻搜索 kdTree - nearest neighbour search kdTree

内存高效的最近邻算法 - Memory Efficient Nearest Neighbour Algorithm

稀疏矩阵中的有效访问 - Efficient accessing in sparse matrices

对称稀疏矩阵的有效切片 - Efficient slicing of symmetric sparse matrices

Python-带有稀疏稀疏矩阵的高效函数 - Python - Efficient Function with scipy sparse Matrices

是否有连接scipy.sparse矩阵的有效方法？ - Is there an efficient way of concatenating scipy.sparse matrices?

Scipy稀疏矩阵在余弦相似度方面的存储效率不高 - Scipy sparse matrices are not memory efficient in cosine similarity

最近邻搜索 4D 空间 python 快速矢量化 - nearest neighbour search 4D space python fast - vectorization

并行处理-使用pysal python的最近邻居搜索？ - parallel processing - nearest neighbour search using pysal python?

Dijkstra - 最近邻确定 - Dijkstra - Nearest Neighbour Determination

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 最近邻搜索 kdTree - nearest neighbour search kdTree 内存高效的最近邻算法 - Memory Efficient Nearest Neighbour Algorithm 稀疏矩阵中的有效访问 - Efficient accessing in sparse matrices 对称稀疏矩阵的有效切片 - Efficient slicing of symmetric sparse matrices Python-带有稀疏稀疏矩阵的高效函数 - Python - Efficient Function with scipy sparse Matrices 是否有连接scipy.sparse矩阵的有效方法？ - Is there an efficient way of concatenating scipy.sparse matrices? Scipy稀疏矩阵在余弦相似度方面的存储效率不高 - Scipy sparse matrices are not memory efficient in cosine similarity 最近邻搜索 4D 空间 python 快速矢量化 - nearest neighbour search 4D space python fast - vectorization 并行处理-使用pysal python的最近邻居搜索？ - parallel processing - nearest neighbour search using pysal python? Dijkstra - 最近邻确定 - Dijkstra - Nearest Neighbour Determination

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM