简体   繁体   English

Django,Haystack,Solr和Boosting

[英]Django, Haystack, Solr and Boosting

TLDR; TLDR;

How does various boosting types work together in django, django-haystack and solr? 各种助推类型如何在django,django-haystack和solr中协同工作?

I am having trouble getting the most obvious search results to appear first. 我无法将最明显的搜索结果首先显示出来。 If I search for caring for others and get 10 results, The object with title caring for others appears second in the results after caring for yourself . 如果我寻找caring for others并获得10个结果,那么caring for others标题对象在caring for yourself后会在结果中排​​在第二位。

Document Boosting 文件提升

I have document boosted Category objects a factor of factor = 2.0 - ((the mptt tree level)/10) so 1.9 for root nodes, 1.8 for second level, 1.7 for third level so on and so forth. 我有文件提升Category对象factor = 2.0 - ((the mptt tree level)/10)所以1.9为根节点,1.8为第二级,1.7为第三级等等。 (or 190%, 180%, 170%... so on and so forth) (或190%,180%,170%......等等)

Field Boosting 现场提升

title is boosted by boost=1.5 positive factor of 150% content is boosted by boost=.5 negative factor 50% 标题是通过boost=1.5积极因素150%含量由boost=.5负面因素boost=.5 50%

Term Boosting 期限提升

I am currently not boosting any search terms. 我目前没有提高任何搜索条件。

My Goal 我的目标

I want to get a list of results Categories and Articles (I'm ignoring Articles until I get my Category results straight). 我想得到一个结果列表类别和文章(我忽略文章,直到我得到我的类别结果直接)。 With Categories weighted higher than Articles, and titles weighted higher than content. 类别加权高于文章,标题加权高于内容。 Also, I'm trying to weight root category nodes higher than child nodes. 此外,我正在尝试将根类别节点加权高于子节点。

I feel like I'm missing a key concept somewhere. 我觉得我在某个地方错过了一个关键概念。

Information 信息

I'm using haystack's built-in search form and search view. 我正在使用haystack的内置搜索表单和搜索视图。

I'm using the following package/lib versions: 我正在使用以下package / lib版本:

Django==1.4.1
django-haystack==1.2.7
pysolr==2.1.0-beta

My Index Class 我的索引类

class CategoryIndex(SearchIndex):
    """Categorization -> Category"""
    text = CharField(document=True, use_template=True, boost=.5)
    title = CharField(model_attr='title', boost=1.5)
    content = CharField(model_attr='content', boost=.5)
    autocomplete = EdgeNgramField(model_attr='title')

    def prepare_title(self, object): 
        return object.title

    def prepare(self, obj):
        data = super(CategoryIndex, self).prepare(obj)
        base_boost = 2.0
        base_boost -= (float(int(obj.level))/10)
        data['boost'] = base_boost
        return data

my search template at templates/search/categorization/category_text.txt 我的搜索模板位于templates/search/categorization/category_text.txt

{{ object.title }}
{{ object.content }}

UPDATE UPDATE

I noticed that when I took {{ object.content }} out of my search template, that records started appearing in the expected order. 我注意到当我从搜索模板中取出{{ object.content }}时,该记录开始以预期的顺序出现。 Why is this? 为什么是这样?

The Dismax Parser (additionally ExtendedDismax from SOLR 3.1 on) has been created exactly for these needs. Dismax Parser(另外来自SOLR 3.1的ExtendedDismax)已经完全针对这些需求而创建。 You can configure all the fields that you want to have searched ('qf' parameter), add custom boosting to each and specify those fields where phrase hits are especially valuable (adding to the hit's score; the 'pf' parameter). 您可以配置要搜索的所有字段('qf'参数),为每个字段添加自定义提升,并指定短语命中特别有价值的字段(添加到匹配的分数;'pf'参数)。 You can also specify how many tokens in a search have to match (by a flexible rule pattern; the 'mm' parameter). 您还可以指定搜索中必须匹配的令牌数(通过灵活的规则模式;'mm'参数)。

eg the config could look like this (part of a request handler config entry in solrconfig.xml - I'm not familiar how to do that with haystack, this is plain SOLR): 例如,配置可能看起来像这样(solrconfig.xml中的请求处理程序配置条目的一部分 - 我不熟悉如何使用haystack,这是简单的SOLR):

<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="qf">text^0.5 title^1.5 content^0.5</str>
<str name="pf">text title^2 content</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
<int name="ps">100</int>

I don't know about haystack but it seems it would provide Dismax functionality: https://github.com/toastdriven/django-haystack/pull/314 我不知道干草堆,但它似乎会提供Dismax功能: https//github.com/toastdriven/django-haystack/pull/314

See this documentation for the Dismax (it links to ExtendedDismax, as well): http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax 有关Dismax的信息,请参阅此文档(它也链接到ExtendedDismax): http ://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax

It seems that you are just trying to be too smart here with all those boosts. 看起来你只是想在这里过于聪明,并提供所有这些提升。

Eg those in fields are completely needles if you are using default search view. 例如,如果您使用默认搜索视图,则字段中的那些完全是针。 In fact auto_query which is runned by default uses only one field to search - only this one marked as document=true. 事实上,默认运行的auto_query只使用一个字段进行搜索 - 只有一个字段标记为document = true。 And haystack actually names this field content internally, so I would sugegst to rename it in search index to avoid any possible conflicts. 而haystack实际上在内部命名了这个字段内容,所以我会sugegst在搜索索引中重命名它以避免任何可能的冲突。

If it doesn't help (probably will not) you must create your custom search form or use simple workaround to achieve something you want, by placing field you want to boost multiple times in template: 如果它没有帮助(可能不会),您必须创建自定义搜索表单或使用简单的解决方法来实现您想要的东西,方法是在模板中放置要多次提升的字段:

{{ object.title }}
{{ object.title }}
{{ object.content }}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM