简体   繁体   中英

Django annotation on (model → FK → model) relation

Galaxies across the universe host millions/billions of stars, each belonging to a specific type, depending on its physical properties (Red stars, Blue Supergiant, White Dwarf, etc). For each Star in my database, I'm trying to find the number of distinct galaxies that are also home for some star of that same type.

class Galaxy(Model):
    ...

class Star(Model):
     galaxy = ForeignKey(Galaxy, related_name='stars')
     type = CharField(...)

Performing this query individually for each Star might be comfortably done by:

star = <some_Star>

desired_galaxies = Galaxy.objects.filter(stars__type=star.type).distinct()
desired_count = desired_galaxies.count()

Or even, albeit more redundant:

desired_count = Star.objects.filter(galaxy__stars__type=star.type).values('galaxy').distinct()

This get a little fuzzier when I try to get the count result for all the stars in a "single" query:

all_stars = Star.objects.annotate(desired_count=...)

The main reason I want to do that is to be capable of sorting Star.objects.order_by('desired_count') in a clean way.

What I've tried so far:

Star.annotate(desired_count=Count('galaxy', filter=Q(galaxy__stars__type=F('type')), distinct=True))

But this annotates 1 for every star. I guess I'll have to go for OuterRef, Subquery here, but not sure on how.

您可以使用GROUP BY来获取计数:

Star.objects.values('type').annotate(desired_count=Count('galaxy')).values('type', 'desired_count')

Django doesn't provide a way to define multi-valued relationships between models that don't involve foreign keys yet. If it did you could do something like

class Galaxy(Model):
    ...

class Star(Model):
    galaxy = ForeignKey(Galaxy, related_name='stars')
    type = CharField(...)
    same_type_stars = Relation(
        'self', from_fields=('type',), to_fields=('type',)
    )

Star.objects.annotate(
    galaxies_count=Count('same_type_stars__galaxy', distinct=True)
)

Which would result in something along

SELECT
  star.*,
  COUNT(DISTINCT same_star_type.galaxy_id) galaxies_count
FROM star
LEFT JOIN star same_star_type ON (same_star_type.type = star.type)
GROUP BY star.id

If you want to achieve something similar you'll need to use subquery for now

Star.objects.annotate(
    galaxies_count=Subquery(Star.objects.filter(
        type=OuterRef('type'),
    ).values('type').values(
        inner_count=Count('galaxy', distinct=True),
    ))
)

Which would result in something along

SELECT
   star.*,
   (
       SELECT COUNT(DISTINCT inner_star.galaxy_id)
       FROM star inner_star
       WHERE inner_star.type = star.type
       GROUP BY inner_star.type
   ) galaxies_count
FROM star

Which likely perform poorly on some databases that don't materialize correlated subqueries (eg MySQL). In all cases make sure you index Star.type otherwise you'll get bad performance no matter what. A composite index on ('type', 'galaxy') might be even better as it might allow you to perform index only scan (eg on PostgreSQL).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM