简体   繁体   English

如何使用Django查询集中的条件注释Count

[英]How to annotate Count with a condition in a Django queryset

Using Django ORM, can one do something like queryset.objects.annotate(Count('queryset_objects', gte=VALUE)) . 使用Django ORM,可以执行类似queryset.objects.annotate(Count('queryset_objects', gte=VALUE)) Catch my drift? 抓住我的漂移?


Here's a quick example to use for illustrating a possible answer: 这是一个用于说明可能答案的简单示例:

In a Django website, content creators submit articles, and regular users view (ie read) the said articles. 在Django网站中,内容创建者提交文章,并且普通用户查看(即阅读)所述文章。 Articles can either be published (ie available for all to read), or in draft mode. 文章可以发表(即可供所有人阅读),也可以草稿模式。 The models depicting these requirements are: 描述这些要求的模型是:

class Article(models.Model):
    author = models.ForeignKey(User)
    published = models.BooleanField(default=False)

class Readership(models.Model):
    reader = models.ForeignKey(User)
    which_article = models.ForeignKey(Article)
    what_time = models.DateTimeField(auto_now_add=True)

My question is: How can I get all published articles, sorted by unique readership from the last 30 mins? 我的问题是:如何获得所有发表的文章,按照过去30分钟的独特读者排序? Ie I want to count how many distinct (unique) views each published article got in the last half an hour, and then produce a list of articles sorted by these distinct views. 即我想要计算每个发表的文章在过去半小时内获得的不同(独特)视图的数量,然后生成按这些不同视图排序的文章列表。


I tried: 我试过了:

date = datetime.now()-timedelta(minutes=30)
articles = Article.objects.filter(published=True).extra(select = {
  "views" : """
  SELECT COUNT(*)
  FROM myapp_readership
    JOIN myapp_article on myapp_readership.which_article_id = myapp_article.id
  WHERE myapp_readership.reader_id = myapp_user.id
  AND myapp_readership.what_time > %s """ % date,
}).order_by("-views")

This sprang the error: syntax error at or near "01" (where "01" was the datetime object inside extra). 这引发了错误: 语法错误在“01”或附近 (其中“01”是额外的日期时间对象)。 It's not much to go on. 继续下去并不多。

For django >= 1.8 对于django> = 1.8

Use Conditional Aggregation : 使用条件聚合

from django.db.models import Count, Case, When, IntegerField
Article.objects.annotate(
    numviews=Count(Case(
        When(readership__what_time__lt=treshold, then=1),
        output_field=IntegerField(),
    ))
)

Explanation: normal query through your articles will be annotated with numviews field. 说明:通过您的文章的正常查询将使用numviews字段进行注释。 That field will be constructed as a CASE/WHEN expression, wrapped by Count, that will return 1 for readership matching criteria and NULL for readership not matching criteria. 该字段将被构造为CASE / WHEN表达式,由Count包装,对于读者匹配标准将返回1,对于不匹配标准的读者将返回NULL Count will ignore nulls and count only values. Count将忽略空值并仅计算值。

You will get zeros on articles that haven't been viewed recently and you can use that numviews field for sorting and filtering. 您将在最近未查看的文章上获得零,并且您可以使用该numviews字段进行排序和过滤。

Query behind this for PostgreSQL will be: PostgreSQL背后的查询将是:

SELECT
    "app_article"."id",
    "app_article"."author",
    "app_article"."published",
    COUNT(
        CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN 1
        ELSE NULL END
    ) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
    ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"

If we want to track only unique queries, we can add distinction into Count , and make our When clause to return value, we want to distinct on. 如果我们只想跟踪唯一的查询,我们可以在Count添加区别,并使我们的When子句返回值,我们希望区别开来。

from django.db.models import Count, Case, When, CharField, F
Article.objects.annotate(
    numviews=Count(Case(
        When(readership__what_time__lt=treshold, then=F('readership__reader')), # it can be also `readership__reader_id`, it doesn't matter
        output_field=CharField(),
    ), distinct=True)
)

That will produce: 这会产生:

SELECT
    "app_article"."id",
    "app_article"."author",
    "app_article"."published",
    COUNT(
        DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"
        ELSE NULL END
    ) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
    ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"

For django < 1.8 and PostgreSQL 对于django <1.8和PostgreSQL

You can just use raw for executing SQL statement created by newer versions of django. 您可以使用raw来执行由较新版本的django创建的SQL语句。 Apparently there is no simple and optimized method for querying that data without using raw (even with extra there are some problems with injecting required JOIN clause). 显然没有简单和优化的方法来查询数据而不使用raw数据(即使有extra注入所需的JOIN子句也存在一些问题)。

Articles.objects.raw('SELECT'
    '    "app_article"."id",'
    '    "app_article"."author",'
    '    "app_article"."published",'
    '    COUNT('
    '        DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"'
    '        ELSE NULL END'
    '    ) as "numviews"'
    'FROM "app_article" LEFT OUTER JOIN "app_readership"'
    '    ON ("app_article"."id" = "app_readership"."which_article_id")'
    'GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"')

For django >= 2.0 you can use Conditional aggregation with a filter argument in the aggregate functions: 对于django> = 2.0,您可以在聚合函数中使用带有filter参数的条件聚合

from datetime import timedelta
from django.utils import timezone
from django.db.models import Count, Q # need import

Article.objects.annotate(
    numviews=Count(
        'readership__reader__id', 
        filter=Q(readership__what_time__gt=timezone.now() - timedelta(minutes=30)), 
        distinct=True
    )
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM