简体   繁体   English

如何提高django haystack搜索的速度

[英]how to improve speed of django haystack search

I want to create a search engine in my django environment for the simple data structure: 我想在django环境中为简单的数据结构创建一个搜索引擎:

| id         | comapany name    |
|:-----------|-----------------:|
| 12345678   | company A's name |
| 12345687   | peoples pizza a/s|
| 87654321   | sub's for pugs   |

There will be about 800,000 companies and I only want to search by name. 大约有80万家公司,我只想按名称搜索。 When the name is found the ID is returned in my django. 找到名称后,ID将以我的django返回。

I've tried various set ups with haystack, whoosh and such but I keep getting really slow search results as I raise from my test data set of ~500 to the 800,000. 我已经尝试过各种与干草堆,飞快移动等类似的设置,但是当我从约500个测试数据集提高到80万个时,搜索结果一直很慢。 The search some times takes almost an hour . 搜索有时需要近一个小时

I'm using the Paas Heroku so I thought I would try an integrated paid service (searly's elasticsearch implementation). 我正在使用Paas Heroku,所以我认为我会尝试集成的付费服务(早期的elasticsearch实施)。 This helped, but as I arrive at about 80,000 companies it starts getting really slow again. 这有所帮助,但是当我到达大约80,000家公司时,它又开始变得非常缓慢。

Installed Apps 已安装的应用

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',

    # Added.
    'haystack',

    # Then your usual apps...
]

More settings.py 更多settings.py

import os
from urlparse import urlparse

es = urlparse(os.environ.get('SEARCHBOX_URL') or 'http://127.0.0.1:9200/')

port = es.port or 80

HAYSTACK_CONNECTIONS = {
   'default': {
       'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
       'URL': es.scheme + '://' + es.hostname + ':' + str(port),
       'INDEX_NAME': 'documents',
   },


if es.username:
   HAYSTACK_CONNECTIONS['default']['KWARGS'] = {"http_auth": es.username + ':' + es.password}

search_indexes.py search_indexes.py

from haystack import indexes

from hello.models import Article


class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
    '''
    defines the model for the serach Engine.
    '''
    text = indexes.CharField(document=True, use_template=True)
    pub_date = indexes.DateTimeField(model_attr='pub_date')
    # pub_date line was commented out previously
    content_auto = indexes.EdgeNgramField(model_attr='title')

    def get_model(self):
        return Article

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.all()

article_text.txt article_text.txt

{{ object.title }}
{{ object.user.get_full_name }}
{{ object.body }}

urls.py urls.py

url(r'^search/$', views.search_titles, name='search'),

views.py views.py

def search_titles(request):
    txt = request.POST.get('search_text', '')
    if txt and len(txt) >= 4:
        articles = SearchQuerySet().autocomplete(content_auto=txt)
    # if the post request is empty, return nothing
    # this prevents internal server error with jquery
    else:
        articles = []
    return render_to_response('scripts/ajax_search.html',
                              {'articles': articles})

search.html search.html

{% if articles.count > 0 %}
    <!-- simply prints the links to the cvr numbers-->
    <!-- for article in articles -->
    {% for article in "x"|rjust:"15" %} 
        <li><a href="{{ article.object.get_absolute_url }}">{{ article.object.title }}</a></li>
    {% endfor %}

{% else %}

    <li>Try again, or try CVR + &#x23ce;</li>

{% endif %}

index.html (where i call the search engine) index.html(我称之为搜索引擎)

{% csrf_token %}
<input  type="text" id="search" name="search" />

<!-- This <ul> all company names end up-->
<ul id ="search-results"></ul>

I changed my ves.py search method to: 我将ves.py的搜索方法更改为:

txt = request.POST.get('search_text', '')
articles = []
suggestedSearchTerm = ""
if txt and len(txt) >= 4:
    sqs = SearchQuerySet()
    sqs.query.set_limits(low=0, high=8)
    sqs = sqs.filter(content=txt)
    articles = sqs.query.get_results()
    suggestedSearchTerm = SearchQuerySet().spelling_suggestion(txt)
    if suggestedSearchTerm == txt:
        suggestedSearchTerm = ''
    else:
      suggestedSearchTerm = suggestedSearchTerm.lower()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM