简体   繁体   中英

Django Prefetch Related with Filter on Max Value

We have a pair of models that look (roughly) like this:

class Machine(models.Model):
    machine_id = models.CharField(max_length=10)
    # Other irrelevant fields

    @property
    def latest_update(self):
        if self.machineupdate_set.count() == 0:
            return None
        return self.machineupdate_set.order_by('-update_time')[:1].get()

class MachineUpdate(models.Model):
    machine = models.ForeignKey(Machine)
    update_time = models.DateTimeField(auto_now_add=True)
    # Other irrelevant fields

Whenever we load Machine s from the database, we always end up using the latest_update for that machine. When we first implemented this, we had a lot of machines and a fairly small number of updates per machine, so to improve performance (by reducing query count) we added a simple default prefetch to the model manager for Machine :

class MachineManager(models.Manager):

    def get_queryset(self):
        return super(MachineManager, self).get_queryset().prefetch_related('machineupdate_set')

However, things have changed and now we have huge numbers of updates related to each machine, and the prefetch query is starting to become a problem (both in terms of long query execution time and memory consumption).

We are looking for a smarter way of pre-fetching the required data, as all we really need to pre-fetch is the latest update per machine, not all of them. Having looked at the Django prefetch_related docs it looked like we could change get_queryset in our MachineManager to something like this:

def get_queryset(self):
    latest_update_query = MachineUpdate.objects.order_by('-update_time')[:1]
    latest_update_prefetch = models.Prefetch('machineupdate_set', queryset=latest_update_query, to_attr='_latest_update')
    return super(MachineManager, self).get_queryset().prefetch_related(latest_update_prefetch)

and then modify latest_update to use the new attribute populated by the pre-fetch. However, this does not work because whenever we filter a Machine query using this we get an error: AssertionError: Cannot filter a query once a slice has been taken.

Can anyone suggest a solution to this problem, so that we can efficiently load the latest_update for each machine? We are unsure how to fix the issue we have with the above attempt at pre-fetching the latest updates.

(FYI - we have considered adding a is_latest_update boolean field to MachineUpdate which we can filter on, or alternatively a latest_update foreign key reference on Machine , however we want to avoid having to maintain this redundant information).

I see that MachineUpdate.update_time has auto_now_add=True . So we can use Max(MachineUpdate.id) per Machine group to get the last MachineUpdate . Right? If that's True checkout the following code:

class MachineManager(models.Manager):
    pass

class MachineQueryset(models.QuerySet):
    def with_last_machineupdate(self):
        return self.prefetch_related(models.Prefetch('machineupdate_set',
            queryset=MachineUpdate.objects.filter(
                id__in=Machine.objects \
                    .annotate(last_machineupdate_id=models.Max('machineupdate__id')) \
                    .values_list('last_machineupdate_id', flat=True) \
            ),
            #notice the list word
            to_attr='last_machineupdate_list'
        ))


class Machine(models.Model):
    machine_id = models.CharField(max_length=10)
    objects = MachineManager.from_queryset(MachineQueryset)()

    @property
    def latest_update(self):
        if hasattr(self, 'last_machineupdate_list') and len(self.last_machineupdate_list) > 0:
            return self.last_machineupdate_list[0]
        return None

class MachineUpdate(models.Model):
    machine = models.ForeignKey(Machine)
    update_time = models.DateTimeField(auto_now_add=True)

    def __unicode__(self):
        return str(self.update_time)

Usage:

machines = Machine.objects.filter(...).with_last_machineupdate()

If that's not the case, eg we can't use Max('machineupdate__id') and we need to stick with update_time field. Then a slightly more optimized solution (but still getting all MachineUpdates per Machine ) looks like this:

class MachineManager(models.Manager):
    def get_queryset(self):
        return super(MachineManager, self).get_queryset() \
            .prefetch_related(models.Prefetch('machineupdate_set',
                queryset=MachineUpdate.objects.order_by('-update_time')
            ))

class Machine(models.Model):
    machine_id = models.CharField(max_length=10)
    objects = MachineManager()

    @property
    def latest_update(self):
        #this will not make queries
        machine_updates = self.machineupdate_set.all()
        if len(machine_updates) > 0:
            return machine_updates[0]
        return None

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM