Need a workaround to filter on related model and aggregated fields in Django

Question

I opened a ticket for this problem.

In a nutshell here is my model:

class Plan(models.Model):
 cap = models.IntegerField()

class Phone(models.Model):
 plan = models.ForeignKey(Plan, related_name='phones')

class Call(models.Model):
 phone = models.ForeignKey(Phone, related_name='calls')
 cost = models.IntegerField()

I want to run a query like this one:

Phone.objects.annotate(total_cost=Sum('calls__cost')).filter(total_cost__gte=0.5*F('plan__cap'))

Unfortunately Django generates bad SQL:

SELECT "app_phone"."id", "app_phone"."plan_id",
SUM("app_call"."cost") AS "total_cost"
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id"
HAVING SUM("app_call"."cost") >=  0.5 * "app_plan"."cap"

and errors with:

ProgrammingError: column "app_plan.cap" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...."plan_id" HAVING SUM("app_call"."cost") >=  0.5 * "app_plan"....

Is there any workaround apart from running raw SQL?

Answer 1

When aggregating, SQL requires any value in a field either be unique within a group, or that the field be wrapped in an aggregation function which ensures that only one value will come out for each group. The problem here is that "app_plan.cap" could have many different values for each combination of "app_phone.id" and "app_phone.plan_id", so you need to tell the DB how to treat those.

So, valid SQL for your result is one of two different possibilities, depending on the result you want. First, you could include app_plan.cap in the GROUP BY function, so that any distinct combination of (app_phone.id, app_phone.plan_id, app_plan.cap) will be a different group:

SELECT "app_phone"."id", "app_phone"."plan_id", "app_plan"."cap",
SUM("app_call"."cost") AS "total_cost"
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id", "app_plan"."cap"
HAVING SUM("app_call"."cost") >=  0.5 * "app_plan"."cap"

The trick is to get the extra value into the "GROUP BY" call. We can weasel our way into this by abusing "extra", though this hard-codes the table name for "app_plan" which is unideal -- you could do it programmatically with the Plan class instead if you wanted:

Phone.objects.extra({
    "plan_cap": "app_plan.cap"
}).annotate(
    total_cost=Sum('calls__cost')
).filter(total_cost__gte=0.5*F('plan__cap'))

Alternatively, you could wrap app_plan.cap in an aggregation function, turning it into a unique value. Aggregation functions vary by DB provider, but might include things like AVG, MAX, MIN, etc.

SELECT "app_phone"."id", "app_phone"."plan_id",
SUM("app_call"."cost") AS "total_cost",
AVG("app_plan"."cap") AS "avg_cap",
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id"
HAVING SUM("app_call"."cost") >=  0.5 * AVG("app_plan"."cap")

You could get this result in Django using the following:

Phone.objects.annotate(
    total_cost=Sum('calls__cost'), 
    avg_cap=Avg('plan__cap')
).filter(total_cost__gte=0.5 * F("avg_cap"))

You may want to consider updating the bug report you left with a clearer specification of the result you expect -- for example, the valid SQL you're after.

Need a workaround to filter on related model and aggregated fields in Django

Question

1 answers

solution1
1 ACCPTED 2010-05-03 22:14:33

Need a workaround to filter on related model and aggregated fields in Django

Question

1 answers

solution1 1 ACCPTED 2010-05-03 22:14:33

solution1
1 ACCPTED 2010-05-03 22:14:33