简体   繁体   中英

Django ORM not generating correct SQL for many to many not in

I'm having a problem with Django's generated SQL from the ORM.

Cartons have a many to many relationship with Shipments through cartons_shipments .

I'm looking to exclude Shipments where they have at least one INBOUND Carton that has a status in ['TRANSIT', 'DELIVERED', 'FAILURE'] .

But I was not getting the results I expected so I turned on the SQL logging.

return Shipment.objects.filter(
    ... # other filtering
    # does not have any inbound cartons in_transit/delivered/failed
    ~Q(
        Q(cartons__type='INBOUND') &
        Q(cartons__status__in=['TRANSIT', 'DELIVERED', 'FAILURE'])
    ) &
).distinct()

I have also tried this as my filter but got the same SQL output.

~Q(
    cartons__type='INBOUND', 
    cartons__status__in=['TRANSIT', 'DELIVERED', 'FAILURE']
)

This generates this SQL:

AND NOT (
    "shipments"."id" IN (
        SELECT U1."shipment_id" 
        FROM "cartons_shipments" U1 
        INNER JOIN "cartons" U2 ON (U1."carton_id" = U2."id") 
        WHERE U2."type" = 'INBOUND'
    ) 
    AND "shipments"."id" IN (
        SELECT U1."shipment_id" FROM "cartons_shipments" U1 
        INNER JOIN "cartons" U2 ON (U1."carton_id" = U2."id") 
        WHERE U2."status" IN ('TRANSIT', 'DELIVERED', 'FAILURE')
    )
) 

But this isn't correct as it would exclude shipments where it has any INBOUND cartons and shipments where any carton (not necessarily INBOUND cartons) has a status is in ['TRANSIT', 'DELIVERED', 'FAILURE'] . I need this logic combined.

Also now I'm running 2 sub selects and taking a significant performance hit because we have a ton of cartons in those statuses.

The correct SQL would be something like:

AND NOT ("shipments"."id" IN (
    SELECT U1."shipment_id" 
    FROM "cartons_shipments" U1 
    INNER JOIN "cartons" U2 ON (U1."carton_id" = U2."id") 
    WHERE U2."type" = 'INBOUND'
    and U2."status" IN ('TRANSIT', 'DELIVERED', 'FAILURE')
))

This way I would only be excluding shipments with INBOUND Cartons in those statuses.

The query time between these two is significant and of course I'm able to get the correct results with the 2nd SQL example. I thought that I could combine that logic by combining the Q() objects. But can't figure it out.

I have also thought that maybe I could just right the raw SQL in the 2nd example. But I'm having a hard time figuring out how to combine raw sql with other ORM filters.

Any help would be greatly appreciated.


Edit:

I'm able to get the correct result by doing the filtering in code and removing the filter from the query:

returned_cartons = Carton.objects.prefetch_related('shipments').filter(
    type='INBOUND',
    status__in=['TRANSIT', 'DELIVERED', 'FAILURE']
)

returned_shipment_ids = list(map(
    lambda carton: carton.shipments.first().id,
    returned_cartons
))

return list(filter(
    lambda shipment: shipment.id not in returned_shipment_ids,
    shipments
))

This unfortunately is too slow to be useful.


Final solution based on Endre Both's idea 🙌

return Shipment.objects.filter(
    ...,  # other filtering
    # has at least 1 inbound carton
    Q(cartons__type='INBOUND')
).exclude(
    # we want to exclude shipments that have at least 1 inbound cartons
    # with a status in transit/delivered/failure
    id__in=Shipment.objects.filter(
        ...,  # filters to limit the number of records returned
        cartons__type='INBOUND',
        cartons__status__in=['TRANSIT', 'DELIVERED', 'FAILURE'],
    ).distinct()
).distinct()

This line Q(cartons__type='INBOUND') is required because we are excluding Shipments that have an INBOUND Carton in ['TRANSIT', 'DELIVERED', 'FAILURE'] statuses. But we would also keep Shipments that don't have any Cartons.

Hope this helps more people out there.

To us mere mortals, the "M" in ORM can be a bit inscrutable at times. But you could try a different, simpler tack. It still uses a subquery and not a join, but this is not necessarily a performance drag.

Shipment.objects.exclude(
    id__in=Cartons.objects
        .filter(type='INBOUND',
                status__in=['TRANSIT', 'DELIVERED', 'FAILURE'])
        .values('shipments__id')
        .distinct()
)

The exact name of the reference back to the Shipment primary key from the Carton model depends on the exact definition of the models. I've used shipments__id , but it could be shipment_set__id or something else.


New idea: You need to base the subselect on the intermediate model rather than Cartons . If you have an explicit intermediate model, it's easy, if you don't, you first need a Shipment or Cartons object, because as far as I know you cannot get a reference to the intermediate model from the class itself, only from an instance.

IModel = Shipment.objects.first().cartons.through
Shipment.objects.exclude(
    id__in=IModel.objects
        .filter(cartons__type='INBOUND',
                cartons__status__in=['TRANSIT', 'DELIVERED', 'FAILURE'])
        .values('shipment__id')
        .distinct()
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM