Tag[locality-sensitive-hash] Recent Newest Questions

Is the number of rows always 1 in each band in the Spark implementation of MinHashLSH

I'm trying to understand the MinHash LSH implementation in Spark, org.apache.spark.ml.feature.MinHashLSH. These two files seem the most relevant: Min ...

How to hash a signature matrix to buckets in Locality-sensitive hashing (LSH)

I understand the algorithm behind creating signature matrix from shingles by applying hash functions. However I don't understand how to hash a specifi ...

Can Locality Sensitive Hashing be applied on dynamic-dimensional data points?

For example assume that we have some vectors with differnt length and what we want to do is measuring the similarity between each two pair of these ve ...

Faster implementation of LSH (AND-OR)

I have a data set of size (160000,3200), in which all the elements are either zero or one. I want to find similar candidates. I have hashed it to (160 ...

Matching millions of people: k-d tree or locality-sensitive hashing?

I am looking for a performant algorithm to match a large number of people by location, gender and age according to this data structure: Longitude ...

LSH Spark stucks forever at approxSimilarityJoin() function

I am trying to implement LSH spark to find nearest neighbours for each user on very large datasets containing 50000 rows and ~5000 features for each r ...

How can I get the similarity matrix from minhash LSH?

I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which ...

How to determine upper bound of c when estimating jaccard similarity between documents?

Let's say I've a million documents that I preprocessed (calculated signatures for using minhash) in O(D*sqrt(D)) time where D is the number of documen ...

Optimum number of permutations to use for estimating set similarity using min hash

Let's say I have to find estimate the jaccard similarity between documents A and B, and I use k random permutations of the union of these sets/documen ...

What value to use for numHashTable in Spark LSH by Uber?

I'm trying to use .approxSimilarityJoin of Spark MLlib LSH: MinHash for Jaccard Distance e.g. I understand that the higher the numHashTables, the m ...

Cannot find the rows using sorting, writing after LSH

I've used LSH after ALS algorithm using pyspark and all seems works fine till I accidentally saw that I had some lost rows during the exploring. All w ...

BucketRandomProjectionLSH KNN parameters

I am trying to use KNN algorithm from spark 2.2.0. I am wondering how I should set my bucket length. The record count/number of features varies, so I ...

Locality-sensitive hashing of strings?

Is there a hash function for strings, such that strings within a small edit distance (for example, misspellings) would map to the same, or very close, ...

Matlab: reshape 4-d matrix to 2-d and maintain order, how to?

I'm trying yo implement vlsh with the California ND Datastet, wich is composed by 701 photos. 10 subject wrote down in a txt file which photos are nea ...

Deep learning model to find similar images (locality sensitive hashing)

There are different pictures of the same object. The pictures made from different angles, so while the object on the picture is the same, the pictures ...

Cannot get faster results via yarn when running spark in a hadoop cluster

Applying an LSH algorithm in Spark 1.4 (https://github.com/soundcloud/cosine-lsh-join-spark/tree/master/src/main/scala/com/soundcloud/lsh), I process ...

approximate nearest neighbor (A1NN) for high dimension spaces

I read this question about finding the closest neighbor for 3-dimensions points. Octree is a solution for this case. kd-Tree is a solution for small ...

Pandas fuzzy detect duplicates

How can use fuzzy matching in pandas to detect duplicate rows (efficiently) How to find duplicates of one column vs. all the other ones without a ...

Non-empty buckets in LSH

I'm reading this survey about LSH, in particular citing the last paragraph of section 2.2.1: To improve the recall, L hash tables are constructed, ...

Bag of Features / Visual Words + Locality Sensitive Hashing

PREMISE: I'm really new to Computer Vision/Image Processing and Machine Learning (luckily, I'm more expert on Information retrieval), so please be ki ...