I'm trying to understand the MinHash LSH implementation in Spark, org.apache.spark.ml.feature.MinHashLSH. These two files seem the most relevant: Min ...
I'm trying to understand the MinHash LSH implementation in Spark, org.apache.spark.ml.feature.MinHashLSH. These two files seem the most relevant: Min ...
I understand the algorithm behind creating signature matrix from shingles by applying hash functions. However I don't understand how to hash a specifi ...
For example assume that we have some vectors with differnt length and what we want to do is measuring the similarity between each two pair of these ve ...
I have a data set of size (160000,3200), in which all the elements are either zero or one. I want to find similar candidates. I have hashed it to (160 ...
I am looking for a performant algorithm to match a large number of people by location, gender and age according to this data structure: Longitude ...
I am trying to implement LSH spark to find nearest neighbours for each user on very large datasets containing 50000 rows and ~5000 features for each r ...
I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which ...
Let's say I've a million documents that I preprocessed (calculated signatures for using minhash) in O(D*sqrt(D)) time where D is the number of documen ...
Let's say I have to find estimate the jaccard similarity between documents A and B, and I use k random permutations of the union of these sets/documen ...
I'm trying to use .approxSimilarityJoin of Spark MLlib LSH: MinHash for Jaccard Distance e.g. I understand that the higher the numHashTables, the m ...
I've used LSH after ALS algorithm using pyspark and all seems works fine till I accidentally saw that I had some lost rows during the exploring. All w ...
I am trying to use KNN algorithm from spark 2.2.0. I am wondering how I should set my bucket length. The record count/number of features varies, so I ...
Is there a hash function for strings, such that strings within a small edit distance (for example, misspellings) would map to the same, or very close, ...
I'm trying yo implement vlsh with the California ND Datastet, wich is composed by 701 photos. 10 subject wrote down in a txt file which photos are nea ...
There are different pictures of the same object. The pictures made from different angles, so while the object on the picture is the same, the pictures ...
Applying an LSH algorithm in Spark 1.4 (https://github.com/soundcloud/cosine-lsh-join-spark/tree/master/src/main/scala/com/soundcloud/lsh), I process ...
I read this question about finding the closest neighbor for 3-dimensions points. Octree is a solution for this case. kd-Tree is a solution for small ...
How can use fuzzy matching in pandas to detect duplicate rows (efficiently) How to find duplicates of one column vs. all the other ones without a ...
I'm reading this survey about LSH, in particular citing the last paragraph of section 2.2.1: To improve the recall, L hash tables are constructed, ...
PREMISE: I'm really new to Computer Vision/Image Processing and Machine Learning (luckily, I'm more expert on Information retrieval), so please be ki ...