简体   繁体   English

使用django-haystack + Elasticsearch如何搜索单词的子集?

[英]Using django-haystack +Elasticsearch how can I search subsets of a word?

If I enter the query "apple" i wish to get the results like "xyzapplexyz","apple","applexyz" and NOT like "app" or "appl" . 如果我输入查询“苹果”我希望得到的结果,如“xyzapplexyz”,“苹果”,“applexyz”,而不是像“应用”或“申请”。 But what I am getting is "applexyz", "app" etc. 但是我得到的是“ applexyz”,“ app”等。

I have used EdgeNgram field and I have tried querying using the following :- 我使用了EdgeNgram字段,并尝试使用以下命令进行查询:-

1-->> SearchQuerySet().all().autocomplete(authors=query) 1->> SearchQuerySet().all().autocomplete(authors=query)

2-->> SearchQuerySet().all().filter(authors=query) 2->> SearchQuerySet().all().filter(authors=query)

3-->> SearchQuerySet().all().filter(content=query) 3->> SearchQuerySet().all().filter(content=query)

4-->> SearchQuerySet().all().autocomplete(content=query) 4->> SearchQuerySet().all().autocomplete(content=query)

But none of them gives the required results. 但是它们都没有给出所需的结果。 How can I resolve this issue? 我该如何解决这个问题?

If you want results like "xyzapplexyz" , then you would need to use ngram analyzer instead of EdgeNGram or you could use both depending on your requirements. 如果您想要类似"xyzapplexyz"结果,则需要使用ngram分析器而不是EdgeNGram或者可以根据需要使用两者。 EdgeNGram generates tokens only from the beginning. EdgeNGram仅从一开始就生成令牌。

with NGram apple will be one of the generated tokens for term xyzapplexyz assuming max_gram >=5 and you will get expected results, also search_analyzer needs to be different or you will get weird results. 假设max_gram >=5 ,使用NGram 苹果将是术语xyzapplexyz生成的令牌之一,您将获得预期的结果,而且search_analyzer也需要不同,否则您将获得怪异的结果。

Also index size might get pretty big with ngram if you have huge chunk of text 另外,如果您有大量文本,则使用ngram index size可能会变得很大

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM