[英]Partial search not working on Elasticsearch+Haystack in spite of using Ngram and Edgengram for building index
I am building the indexes like : 我正在建立像这样的索引:
class BookIndex(indexes.SearchIndex,indexes.Indexable):
text= indexes.EdgeNgramField(document=True,use_template=True)
content_auto = indexes.EdgeNgramField(model_attr='title')
isbn_13 = indexes.CharField(model_attr='isbn_13')
validate = indexes.IntegerField(model_attr='validate')
price = indexes.IntegerField(model_attr='price')
authors = indexes.EdgeNgramField()
reviews = indexes.CharField()
publishers = indexes.EdgeNgramField()
institutes = indexes.EdgeNgramField()
sellers = indexes.CharField()
category = indexes.CharField()
sub_category = indexes.CharField()
I even tried using Ngram but partial search is not working. 我什至尝试使用Ngram,但部分搜索无法正常工作。
I am querying it like SearchQuerySet().all().filter(content=query)
I also tried SearchQuerySet().filter(content__contains=query)
even then it is not showing results for partial match. 我像SearchQuerySet().all().filter(content=query)
一样查询它,我也尝试过SearchQuerySet().filter(content__contains=query)
即使它没有显示部分匹配的结果。
Can someone please help me out? 有人可以帮我吗?
Haystack is not very good with ElasticSearch, you cannot use proper indexing values so you have to provide custom ElasticSearchBackEnd to enable it: Haystack在ElasticSearch上不是很好,您不能使用适当的索引值,因此必须提供自定义ElasticSearchBackEnd才能启用它:
#in a search_backends.py file
from django.conf import settings
from haystack.backends.elasticsearch_backend import (
ElasticsearchSearchBackend,
ElasticsearchSearchEngine
)
from haystack.fields import EdgeNgramField as BaseEdgeNgramField, NgramField as BaseNgramField
from haystack.indexes import CharField
#just an example of which degree of configuration could be possible
CUSTOM_FIELD_TYPE = {
'completion': {
'type': 'completion',
'payloads': True,
'analyzer': 'suggest_analyzer',
'preserve_separators': True,
'preserve_position_increments': False
},
}
# Custom Backend
class CustomElasticBackend(ElasticsearchSearchBackend):
DEFAULT_ANALYZER = None
def __init__(self, connection_alias, **connection_options):
super(CustomElasticBackend, self).__init__(
connection_alias, **connection_options)
user_settings = getattr(settings, 'ELASTICSEARCH_INDEX_SETTINGS', None)
self.DEFAULT_ANALYZER = getattr(settings, 'ELASTICSEARCH_DEFAULT_ANALYZER', "snowball")
if user_settings:
setattr(self, 'DEFAULT_SETTINGS', user_settings)
def build_schema(self, fields):
content_field_name, mapping = super(CustomElasticBackend,
self).build_schema(fields)
for field_name, field_class in fields.items():
field_mapping = mapping[field_class.index_fieldname]
index_analyzer = getattr(field_class, 'index_analyzer', None)
search_analyzer = getattr(field_class, 'search_analyzer', None)
field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER)
if field_mapping['type'] == 'string' and field_class.indexed:
field_mapping["term_vector"] = "with_positions_offsets"
if not hasattr(field_class, 'facet_for') and not field_class.field_type in('ngram', 'edge_ngram'):
field_mapping['analyzer'] = field_analyzer
if field_class.field_type in CUSTOM_FIELD_TYPE:
field_mapping = CUSTOM_FIELD_TYPE.get(field_class.field_type).copy()
if index_analyzer and search_analyzer:
field_mapping['index_analyzer'] = index_analyzer
field_mapping['search_analyzer'] = search_analyzer
if 'analyzer' in field_mapping:
del(field_mapping['analyzer'])
mapping.update({field_class.index_fieldname: field_mapping})
return (content_field_name, mapping)
class CustomElasticSearchEngine(ElasticsearchSearchEngine):
backend = CustomElasticBackend
# Custom fields, just use the ones you need or create yours
class CustomFieldMixin(object):
def __init__(self, **kwargs):
self.analyzer = kwargs.pop('analyzer', None)
self.index_analyzer = kwargs.pop('index_analyzer', None)
self.search_analyzer = kwargs.pop('search_analyzer', None)
super(CustomFieldMixin, self).__init__(**kwargs)
class CustomCharField(CustomFieldMixin, CharField):
pass
class CustomCompletionField(CustomFieldMixin, CharField):
field_type = 'completion'
class CustomEdgeNgramField(CustomFieldMixin, BaseEdgeNgramField):
pass
class CustomNgramField(CustomFieldMixin, BaseNgramField):
pass
#settings.py
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
},
"str_index_analyzer" : {
"type": "custom",
"tokenizer" : "haystack_ngram_tokenizer",
"filter" : ["stopwords", "asciifolding", "lowercase", "snowball", "elision", "worddelimiter"]
},
"str_search_analyzer" : {
"type": "custom",
"tokenizer" : "standard",
"filter" : ["stopwords", "asciifolding", "lowercase", "snowball", "elision", "worddelimiter"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"stopwords",
"standard",
"lowercase",
"asciifolding"
]
},
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
},
},
"filter": {
"elision": {
"type": "elision",
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d"]
},
"stopwords": {
"type": "stop",
"stopwords": ["_french_", "_english_"],
"ignore_case": True
},
"worddelimiter": {
"type": "word_delimiter"
}
}
}
}
}
#Haystack settings
HAYSTACK_CONNECTIONS = {
'default': {
...
'ENGINE': 'path.to.search_backends.CustomElasticSearchEngine',
...
},
}
Using elasticsearch-2.x
with django-haystack
versions <2.5
causes this issue. 在elasticsearch-2.x
版本<2.5
django-haystack
使用elasticsearch-2.x
会导致此问题。 Check if your versions match these. 检查您的版本是否与这些匹配。
elasticsearch-2.x
onwards, boost
is no longer a support meta-data which haystack passes to it. elasticsearch-2.x
开始, boost
不再是干草堆传递给它的支持元数据。 (Please refer to the answer https://stackoverflow.com/a/36847352/5108155 ) (请参阅答案https://stackoverflow.com/a/36847352/5108155 )
This issue was fixed in 2.5
verison of haystack. 此问题已在2.5
版本的干草堆中修复。
While building (or updating) your index, elasticsearch
never got the ngram analyzer you intended to apply to the field. 同时建立(或更新)索引, elasticsearch
从来没有得到NGRAM分析你打算申请到外地。 You can validate this by manually running- curl 'http://<elasticsearch_address>/<index_name>/?pretty'
This will show only the types on the fields and no analyzer property. 您可以通过手动运行来验证这一点-curl'http curl 'http://<elasticsearch_address>/<index_name>/?pretty'
这将仅在字段上显示类型,而没有显示分析器属性。
Interesting thing is that haystack doesn't throw this exception because of an internal silently_fail
property in the ElasticSearchBackend
class. 有趣的是,由于ElasticSearchBackend
类中的内部silently_fail
属性,干草堆不会引发此异常。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.