如何改进这个 Django 查询？

Question

我有这个 Django 查询需要几分钟才能运行。

stat_type = 'Some String'
obj = xxx # some brand object

query = Q( Q(stat_type__icontains=stat_type) & Q(Q(brand=obj) | Q(organisation__in=obj.organisation_set.active())))

result = ViewStat.objects.filter(query).aggregate(one=Count('id', filter=Q(created__gte=timezone.now() - relativedelta(months=int(1)))), \
    three=Count('id', filter=Q(created__gte=timezone.now() - relativedelta(months=int(3)))), 
    twelve=Count('id', filter=Q(created__gte=timezone.now() - relativedelta(months=int(12)))), 
    all=Count('id', filter=Q(created__gte=timezone.now() - relativedelta(months=int(999)))))

楷模

class Brand(models.Model):
    ...

class Organisation(models.Model):
    brand = models.ForeignKey(Brand, on_delete=models.CASCADE)
    ...

class ViewStat(models.Model):
    stat_type = models.CharField(max_length=21)
    brand = models.ForeignKey(Brand, on_delete=models.SET_NULL, blank=True, null=True)
    organisation = models.ForeignKey(Organisation, on_delete=models.SET_NULL, blank=True, null=True)

我有大约 8 万个组织、700 个品牌和 1000 万个 ViewStats。

如何提高查询性能？

Answer 1

如果没有衡量标准进行比较，就很难做出改进。 但这里有一些想法。

首先，我对代码进行了一些重新排序，以便更好地理解它。

from django.db.models import Count, Q
from django.utils import timezone as tz
from .models import Brand, ViewStat

stat_type = 'Some String'
some_brand = Brand.objects.first()
active_org_id_set = set(
    some_brand.organisation_set.active().values_list('id', flat=True))

time_now = tz.now()
one_month_ago = time_now - relativedelta(months=int(1))
three_months_ago = time_now - relativedelta(months=int(3))
twelve_months_ago = time_now - relativedelta(months=int(12))

result = ViewStat.objects.select_related(None)\
    .filter(stat_type__icontains=stat_type)\
    .filter(
        Q(
            Q(brand_id=some_brand.pk)
            | Q(organisation_id__in=active_org_id_set)))\
    .aggregate(
        one=Count('id', filter=Q(created__gte=one_month_ago),
        three=Count('id', filter=Q(created__gte=three_months_ago)),
        twelve=Count('id', filter=Q(created__gte=twelve_months_ago)),
        all=Count('id')))

对tz.now()使用一次调用总是更好。

似乎可以省略聚合中的最后一个过滤器（使用relativedelta(999) ）。

我更喜欢使用单独的变量来保存过滤器数据，所以我创建了active_org_id_set 。 请注意，我只是收集到 PK（使用.values_list() ），而不是整个Organisation对象，因此消耗的 memory 少了很多。

然后我使用这个active_org_id_set并使用organisation_id__in而不是organisation__in ，这样就不需要将Organisation表与ViewStat表连接起来。

我还过滤brand_id而不是brand ，以避免将Brand表与ViewStat表连接起来。

我明确使用.select_related(None)将连接的表减少到最低限度。 也许这个调用不是必需的，但我无权访问您的数据库执行计划。

如何改进这个 Django 查询？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-16 15:19:41

如何改进这个 Django 查询？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-16 15:19:41

解决方案1
1 已采纳 2020-06-16 15:19:41