简体   繁体   English

在 Django 中,检查空查询集的最有效方法是什么?

[英]In Django, what is the most efficient way to check for an empty query set?

I've heard suggestions to use the following:我听说过使用以下内容的建议:

if qs.exists():
    ...

if qs.count():
    ...

try:
    qs[0]
except IndexError:
    ...

Copied from comment below: "I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists.从下面的评论中复制:“我正在寻找这样的语句”在 MySQL 和 PostgreSQL 中,count() 对于短查询更快,exists() 对于长查询更快,并且在您可能需要时使用 QuerySet[0]将需要第一个元素并且您想检查它是否存在。 However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two."但是,当 count() 更快时,它只会稍微快一点,因此建议在两者之间进行选择时始终使用exists()。”

query.exists() is the most efficient way. query.exists()是最有效的方式。

Especially on postgres count() can be very expensive, sometimes more expensive then a normal select query.特别是在 postgres count()上可能非常昂贵,有时比普通的选择查询更昂贵。

exists() runs a query with no select_related, field selections or sorting and only fetches a single record. exists()运行一个没有 select_related、字段选择或排序的查询,并且只获取一条记录。 This is much faster then counting the entire query with table joins and sorting.这比使用表连接和排序计算整个查询要快得多。

qs[0] would still includes select_related, field selections and sorting; qs[0]仍将包括 select_related、字段选择和排序; so it would be more expensive.所以会更贵。

The Django source code is here (django/db/models/sql/query.py RawQuery.has_results): Django 源代码在这里(django/db/models/sql/query.py RawQuery.has_results):

https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499 https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499

def has_results(self, using):
    q = self.clone()
    if not q.distinct:
        q.clear_select_clause()
    q.clear_ordering(True)
    q.set_limits(high=1)
    compiler = q.get_compiler(using=using)
    return compiler.has_results()

Another gotcha that got me the other day is invoking a QuerySet in an if statement.前几天让我遇到的另一个问题是在 if 语句中调用 QuerySet。 That executes and returns the whole query !执行并返回整个查询!

If the variable query_set may be None (unset argument to your function) then use:如果变量 query_set 可能是NoneNone设置函数的参数),则使用:

if query_set is None:
    # 

not:不是:

if query_set:
   # you just hit the database

exists() is generally faster than count(), though not always (see test below). exists() 通常比 count() 快,但并非总是如此(请参阅下面的测试)。 count() can be used to check for both existence and length. count() 可用于检查是否存在和长度。

Only use qs[0] if you actually need the object.仅当您确实需要该对象时才使用qs[0] It's significantly slower if you're just testing for existence.如果您只是在测试是否存在,它会明显变慢。

On Amazon SimpleDB, 400,000 rows:在 Amazon SimpleDB 上,400,000 行:

  • bare qs : 325.00 usec/passqs : 325.00 usec/pass
  • qs.exists() : 144.46 usec/pass qs.exists() : 144.46 使用 c/pass
  • qs.count() 144.33 usec/pass qs.count() 144.33 usec/pass
  • qs[0] : 324.98 usec/pass qs[0] : 324.98 使用 c/pass

On MySQL, 57 rows:在 MySQL 上,57 行:

  • bare qs : 1.07 usec/passqs : 1.07 usec/pass
  • qs.exists() : 1.21 usec/pass qs.exists() : 1.21 usec/pass
  • qs.count() : 1.16 usec/pass qs.count() : 1.16 usec/pass
  • qs[0] : 1.27 usec/pass qs[0] : 1.27 usec/pass

I used a random query for each pass to reduce the risk of db-level caching.我对每次传递使用随机查询来降低数据库级缓存的风险。 Test code:测试代码:

import timeit

base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
    session = Session.objects.filter(ip=ip_addr)%s
    if session:
        pass
except:
    pass
"""

query_variatons = [
    base % "",
    base  % ".exists()",
    base  % ".count()",
    base  % "[0]"
    ]

for s in query_variatons:
    t = timeit.Timer(stmt=s)
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)

It depends on use context.这取决于使用上下文。

According to documentation :根据文档

Use QuerySet.count()使用 QuerySet.count()

...if you only want the count, rather than doing len(queryset). ...如果你只想要计数,而不是做 len(queryset)。

Use QuerySet.exists()使用 QuerySet.exists()

...if you only want to find out if at least one result exists, rather than if queryset. ...如果您只想找出是否存在至少一个结果,而不是查询集是否存在。

But:但:

Don't overuse count() and exists()不要过度使用 count() 和 exists()

If you are going to need other data from the QuerySet, just evaluate it.如果您需要来自 QuerySet 的其他数据,只需评估它。

So, I think that QuerySet.exists() is the most recommended way if you just want to check for an empty QuerySet.因此,如果您只想检查空的 QuerySet,我认为QuerySet.exists()是最推荐的方法。 On the other hand, if you want to use results later, it's better to evaluate it.另一方面,如果您想稍后使用结果,最好对其进行评估。

I also think that your third option is the most expensive, because you need to retrieve all records just to check if any exists.我还认为您的第三个选项是最昂贵的,因为您需要检索所有记录以检查是否存在任何记录。

@Sam Odio's solution was a decent starting point but there's a few flaws in the methodology, namely: @Sam Odio 的解决方案是一个不错的起点,但该方法存在一些缺陷,即:

  1. The random IP address could end up matching 0 or very few results随机 IP 地址可能最终匹配 0 个或很少的结果
  2. An exception would skew the results, so we should aim to avoid handling exceptions异常会扭曲结果,所以我们应该尽量避免处理异常

So instead of filtering something that might match, I decided to exclude something that definitely won't match, hopefully still avoiding the DB cache, but also ensuring the same number of rows.因此,我没有过滤可能匹配的内容,而是决定排除肯定不匹配的内容,希望仍然避免使用 DB 缓存,但也确保相同的行数。

I only tested against a local MySQL database, with the dataset:我只针对本地 MySQL 数据库进行了测试,数据集为:

>>> Session.objects.all().count()
40219

Timing code:计时码:

import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
    pass
"""
s = base.format('count')

query_variations = [
    "",
    ".exists()",
    ".count()",
    "[0]",
]

for variation in query_variations:
    t = timeit.Timer(stmt=base.format(variation))
    print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)

outputs:输出:

           => 1390.177710 usec/pass
.exists()  => 2.479579 usec/pass
.count()   => 22.426991 usec/pass
[0]        => 2.437079 usec/pass

So you can see that count() is roughly 9 times slower than exists() for this dataset.所以你可以看到,对于这个数据集, count()大约比exists()慢9 倍。

[0] is also fast, but it needs exception handling. [0]也很快,但需要异常处理。

I would imagine that the first method is the most efficient way (you could easily implement it in terms of the second method, so perhaps they are almost identical).我想第一种方法是最有效的方法(你可以很容易地用第二种方法实现它,所以也许它们几乎相同)。 The last one requires actually getting a whole object from the database, so it is almost certainly the most expensive.最后一个实际上需要从数据库中获取整个对象,因此几乎可以肯定它是最昂贵的。

But, like all of these questions, the only way to know for your particular database, schema and dataset is to test it yourself.但是,就像所有这些问题一样,了解您的特定数据库、模式和数据集的唯一方法是自己测试。

I was also in this trouble.我也遇到了这个麻烦。 Yes exists() is faster for most cases but it depends a lot on the type of queryset you are trying to do. Yes exists()在大多数情况下更快,但它在很大程度上取决于您尝试执行的查询集的类型。 For example for a simple query like: my_objects = MyObject.objets.all() you would use my_objects.exists() .例如,对于像这样的简单查询: my_objects = MyObject.objets.all()您将使用my_objects.exists() But if you were to do a query like: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values() probably you need to test which one fits better ( exists() , count() , len(my_objects) ).但是,如果您要执行如下查询: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values()您可能需要测试哪个适合更好( exists()count()len(my_objects) )。 Remember the DB engine is the one who will perform the query, and to get a good result in performance, it depends a lot on the data structure and how the query is formed.请记住,数据库引擎是执行查询的引擎,要获得良好的性能结果,很大程度上取决于数据结构和查询的形成方式。 One thing you can do is, audit the queries and test them on your own against the DB engine and compare your results you will be surprised by how naive sometimes django is, try QueryCountMiddleware to see all the queries executed, and you will see what i am talking about.您可以做的一件事是,审核查询并针对数据库引擎自行测试它们并比较您的结果,您会惊讶于 django 有时是多么天真,尝试使用QueryCountMiddleware查看执行的所有查询,您将看到我所做的我在谈论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM