[英]Locality-sensitive hashing - Elasticsearch
is there any plugin allowing LSH on Elasticsearch? 有没有允许LSH在Elasticsearch上的插件? If yes, could you point me to the location and tell me a little how to use it? 如果是的话,你能指点我的位置,并告诉我一些如何使用它? Thanks 谢谢
Edit: I found out that ES uses MinHash plugin. 编辑:我发现ES使用MinHash插件。 How could I compare documents to one another with this? 我怎么能用这个比较文件呢? What would be a good setting to find duplicates? 找到重复的好设置是什么?
There is a Elasticsearch MinHash Plugin . 有一个Elasticsearch MinHash插件 。 You can use it to extract minhash value every time you index a document and query the document by minhash later. 每次索引文档并稍后通过minhash查询文档时,您可以使用它来提取minhash值。
Install MinHash plugin: 安装MinHash插件:
$ $ES_HOME/bin/plugin install org.codelibs/elasticsearch-minhash/2.3.1
Add a minhash analyzer when creating your index: 创建索引时添加minhash分析器:
$ curl -XPUT 'localhost:9200/my_index' -d '{ "index":{ "analysis":{ "analyzer":{ "minhash_analyzer":{ "type":"custom", "tokenizer":"standard", "filter":["minhash"] } } } } }'
Put minhash_value
field into an index mapping: 将minhash_value
字段放入索引映射中:
$ curl -XPUT "localhost:9200/my_index/my_type/_mapping" -d '{ "my_type":{ "properties":{ "message":{ "type":"string", "copy_to":"minhash_value" }, "minhash_value":{ "type":"minhash", "minhash_analyzer":"minhash_analyzer" } } } }'
a. 一种。 Use More like this query can be used to do "like" search on the minhash_value
field: 使用更多像这样的查询可用于在minhash_value
字段上执行“喜欢”搜索:
GET /_search { "query": { "more_like_this" : { "fields" : ["minhash_value"], "like" : "KV5rsUfZpcZdVojpG8mHLA==", "min_term_freq" : 1, "max_query_terms" : 12 } } }
b. 湾 You can also use fuzzy query but it accepts the query to differ from the result by 2
(maximum). 您也可以使用模糊查询,但它接受查询与结果2
(最大)不同。
GET /_search { "query": { "fuzzy" : { "minhash_value" : "KV5rsUfZpcZdVojpG8mHLA==" } } }
You can find more about the fuzzy query here . 您可以在此处找到有关模糊查询的更多信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.