[英]How to annotate Count with a condition in a Django queryset
Using Django ORM, can one do something like queryset.objects.annotate(Count('queryset_objects', gte=VALUE))
. 使用Django ORM,可以执行类似queryset.objects.annotate(Count('queryset_objects', gte=VALUE))
。 Catch my drift? 抓住我的漂移?
Here's a quick example to use for illustrating a possible answer: 这是一个用于说明可能答案的简单示例:
In a Django website, content creators submit articles, and regular users view (ie read) the said articles. 在Django网站中,内容创建者提交文章,并且普通用户查看(即阅读)所述文章。 Articles can either be published (ie available for all to read), or in draft mode. 文章可以发表(即可供所有人阅读),也可以草稿模式。 The models depicting these requirements are: 描述这些要求的模型是:
class Article(models.Model):
author = models.ForeignKey(User)
published = models.BooleanField(default=False)
class Readership(models.Model):
reader = models.ForeignKey(User)
which_article = models.ForeignKey(Article)
what_time = models.DateTimeField(auto_now_add=True)
My question is: How can I get all published articles, sorted by unique readership from the last 30 mins? 我的问题是:如何获得所有发表的文章,按照过去30分钟的独特读者排序? Ie I want to count how many distinct (unique) views each published article got in the last half an hour, and then produce a list of articles sorted by these distinct views. 即我想要计算每个发表的文章在过去半小时内获得的不同(独特)视图的数量,然后生成按这些不同视图排序的文章列表。
I tried: 我试过了:
date = datetime.now()-timedelta(minutes=30)
articles = Article.objects.filter(published=True).extra(select = {
"views" : """
SELECT COUNT(*)
FROM myapp_readership
JOIN myapp_article on myapp_readership.which_article_id = myapp_article.id
WHERE myapp_readership.reader_id = myapp_user.id
AND myapp_readership.what_time > %s """ % date,
}).order_by("-views")
This sprang the error: syntax error at or near "01" (where "01" was the datetime object inside extra). 这引发了错误: 语法错误在“01”或附近 (其中“01”是额外的日期时间对象)。 It's not much to go on. 继续下去并不多。
Use Conditional Aggregation : 使用条件聚合 :
from django.db.models import Count, Case, When, IntegerField
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=1),
output_field=IntegerField(),
))
)
Explanation: normal query through your articles will be annotated with numviews
field. 说明:通过您的文章的正常查询将使用numviews
字段进行注释。 That field will be constructed as a CASE/WHEN expression, wrapped by Count, that will return 1 for readership matching criteria and NULL
for readership not matching criteria. 该字段将被构造为CASE / WHEN表达式,由Count包装,对于读者匹配标准将返回1,对于不匹配标准的读者将返回NULL
。 Count will ignore nulls and count only values. Count将忽略空值并仅计算值。
You will get zeros on articles that haven't been viewed recently and you can use that numviews
field for sorting and filtering. 您将在最近未查看的文章上获得零,并且您可以使用该numviews
字段进行排序和过滤。
Query behind this for PostgreSQL will be: PostgreSQL背后的查询将是:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN 1
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
If we want to track only unique queries, we can add distinction into Count
, and make our When
clause to return value, we want to distinct on. 如果我们只想跟踪唯一的查询,我们可以在Count
添加区别,并使我们的When
子句返回值,我们希望区别开来。
from django.db.models import Count, Case, When, CharField, F
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=F('readership__reader')), # it can be also `readership__reader_id`, it doesn't matter
output_field=CharField(),
), distinct=True)
)
That will produce: 这会产生:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
You can just use raw
for executing SQL statement created by newer versions of django. 您可以使用raw
来执行由较新版本的django创建的SQL语句。 Apparently there is no simple and optimized method for querying that data without using raw
(even with extra
there are some problems with injecting required JOIN
clause). 显然没有简单和优化的方法来查询数据而不使用raw
数据(即使有extra
注入所需的JOIN
子句也存在一些问题)。
Articles.objects.raw('SELECT'
' "app_article"."id",'
' "app_article"."author",'
' "app_article"."published",'
' COUNT('
' DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"'
' ELSE NULL END'
' ) as "numviews"'
'FROM "app_article" LEFT OUTER JOIN "app_readership"'
' ON ("app_article"."id" = "app_readership"."which_article_id")'
'GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"')
For django >= 2.0 you can use Conditional aggregation with a filter
argument in the aggregate functions: 对于django> = 2.0,您可以在聚合函数中使用带有filter
参数的条件聚合 :
from datetime import timedelta
from django.utils import timezone
from django.db.models import Count, Q # need import
Article.objects.annotate(
numviews=Count(
'readership__reader__id',
filter=Q(readership__what_time__gt=timezone.now() - timedelta(minutes=30)),
distinct=True
)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.