简体   繁体   English

Django objects.filter,这有多“昂贵”?

[英]Django objects.filter, how “expensive” would this be?

I am trying to make a search view in Django. 我正在尝试在Django中进行搜索。 It is a search form with freetext input + some options to select, so that you can filter on years and so on. 这是一个搜索表单,带有自由文本输入+一些选项供您选择,以便您可以过滤年份等。 This is some of the code I have in the view so far, the part that does the filtering. 到目前为止,这是我在视图中拥有的一些代码,它是进行过滤的部分。 And I would like some input on how expensive this would be on the database server. 我想就此在数据库服务器上花费多少钱提供一些意见。

soknad_list = Soknad.objects.all()

 if var1: soknad_list = soknad_list.filter(pub_date__year=var1) if var2: soknad_list = soknad_list.filter(muncipality__name__exact=var2) if var3: soknad_list = soknad_list.filter(genre__name__exact=var3) # TEXT SEARCH stop_word_list = re.compile(STOP_WORDS, re.IGNORECASE) search_term = '%s' % request.GET['q'] cleaned_search_term = stop_word_list.sub('', search_term) cleaned_search_term = cleaned_search_term.strip() if len(cleaned_search_term) != 0: soknad_list = soknad_list.filter(Q(dream__icontains=cleaned_search_term) | Q(tags__icontains=cleaned_search_term) | Q(name__icontains=cleaned_search_term) | Q(school__name__icontains=cleaned_search_term)) 

So what I do is, first make a list of all objects, then I check which variables exists (I fetch these with GET on an earlier point) and then I filter the results if they exists. 因此,我要做的是,首先列出所有对象,然后检查哪些变量存在(在较早的位置使用GET获取这些变量),然后过滤结果是否存在。 But this doesn't seem too elegant, it probably does a lot of queries to achieve the result, so is there a better way to this? 但这似乎不太优雅,它可能需要执行很多查询才能达到结果,因此有更好的方法吗?

It does exactly what I want, but I guess there is a better/smarter way to do this. 它确实满足我的要求,但是我想有一种更好/更智能的方法可以做到这一点。 Any ideas? 有任何想法吗?

过滤器本身不会执行查询,除非您从查询中显式获取项(例如get),然后list(query)也执行,否则不会执行任何查询。

You can see the query that will be generated by using: 您可以使用以下命令查看将生成的查询:

soknad_list.query.as_sql()[0]

You can then put that into your database shell to see how long the query takes, or use EXPLAIN (if your database backend supports it) to see how expensive it is. 然后,您可以将其放入数据库外壳中以查看查询所需的时间,或者使用EXPLAIN(如果数据库后端支持该查询)来查看查询的开销。

As Aaron mentioned, you should get a hold of the query text that is going to be run against the database and use an EXPLAIN (or other some method) to view the query execution plan. 如Aaron所述,您应该保留将要针对数据库运行的查询文本,并使用EXPLAIN(或其他某种方法)查看查询执行计划。 Once you have a hold of the execution plan for the query you can see what is going on in the database itself. 掌握了查询的执行计划后,您可以查看数据库本身正在发生的情况。 There are a lot of operations that see very expensive to run through procedural code that are very trivial for any database to run, especially if you provide indexes that the database can use for speeding up your query. 有很多操作对于通过过程代码运行来说是非常昂贵的,这对于任何数据库的运行来说都是微不足道的,尤其是当您提供数据库可以用来加快查询速度的索引时。

If I read your question correctly, you're retrieving a result set of all rows in the Soknad table. 如果我正确阅读了您的问题,则说明您正在检索Soknad表中所有行的结果集。 Once you have these results back you use the filter() method to trim down your results meet your criteria. 一旦获得这些结果,就可以使用filter()方法来缩小结果,使其符合标准。 From looking at the Django documentation, it looks like this will do an in-memory filter rather than re-query the database (of course, this really depends on which data access layer you're using and not on Django itself). 通过查看Django文档,看起来这将执行内存中筛选,而不是重新查询数据库(当然,这实际上取决于您使用的是哪个数据访问层,而不取决于Django本身)。

The most optimal solution would be to use a full-text search engine (Lucene, ferret, etc) to handle this for you. 最佳的解决方案是使用全文本搜索引擎(Lucene,雪貂等)为您处理。 If that is not available or practical the next best option would be to to construct a query predicate (WHERE clause) before issuing your query to the database and let the database perform the filtering. 如果那不可行或不切实际,那么下一个最佳选择是在将查询发布到数据库之前构造一个查询谓词(WHERE子句),然后让数据库执行过滤。

However, as with all things that involve the database, the real answer is 'it depends.' 但是,与涉及数据库的所有事物一样,真正的答案是“取决于”。 The best suggestion is to try out several different approaches using data that is close to production and benchmark them over at least 3 iterations before settling on a final solution to the problem. 最好的建议是使用接近生产的数据尝试几种不同的方法,并在确定最终解决方案之前,至少经过3次迭代对它们进行基准测试。 It may be just as fast, or even faster, to filter in memory rather than filter in the database. 在内存中进行筛选而不是在数据库中进行筛选可能是一样快,甚至更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM