简体   繁体   English

Django ORM 注释查找有多少对象有任何相关 object

[英]Django ORM annotation to find how many objects have any related object

I am currently trying to find out what % of a set of objects have a related object with certain values.我目前正在尝试找出一组对象中有多少百分比具有具有特定值的相关 object。 Specifically, I have a table of objects and a one to many relationship to a table of comments, and I am trying to figure out what percentage of those objects have comments in a specific length.具体来说,我有一个对象表和一个与评论表的一对多关系,我试图找出这些对象中有多少百分比具有特定长度的评论。 Both of these tables are ETL output from a separate dataset to allow easier metric calculations.这两个表都是来自单独数据集的 ETL output,以便更轻松地进行指标计算。

# Models
class Data(models.Model):
    id = models.AutoField(primary_key=True)
    creator_id = models.IntegerField() # Not a real foreign key

class DataCommenter(models.Model):
    do_id = models.ForeignKey(Data)
    creator_id = models.IntegerField() # Not a real foreign key
    short_comments = models.IntegerField()
    medium_comments = models.IntegerField()
    long_comments = models.IntegerField()

From these models, I have some queryset annotations that are being performed to try and get the average as shown below:从这些模型中,我有一些正在执行的查询集注释来尝试获取平均值,如下所示:

# QuerySet
class DataQuerySet(models.QuerySet):
    def extensive_comments(self):
        """Get extensive comment raw selection."""
        inner_query = DataCommenter.objects.exclude(
            creator_id=OuterRef("creator_id")
        ).filter(
            Q(medium_comments__gte=1) | Q(long_comments__gte=1), do_id=OuterRef("id")
        )
        return self.annotate(
            raw_extensive_comments=Case(
                When(
                    Exists(inner_query), then=1
                ), default=0, output_field=FloatField()
            )
        )

    def annotate_period(self):
        """Annotation to allow average without aggregation."""
        return self.annotate(period=Value(value=True, output_field=BooleanField()))

The QuerySet is attached to the Data model and is used as follows: QuerySet 附加到数据 model 并按如下方式使用:

Data.objects.all().annotate_period().extensive_comments().values("period").annotate(
    extensive_comments=ExpressionWrapper(Avg(raw_extensive_comments) * 100, output_field=FloatField())
)

The data that we have includes multiple DataCommenter objects for some Data objects, and for some reason the average runs against the number of DataCommenter objects instead of the number of Data objects, so what should be 3/5 data objects yielding a 60, we get something like 10/12 and get 83.33333.我们拥有的数据包括某些 Data 对象的多个 DataCommenter 对象,并且出于某种原因,平均值针对 DataCommenter 对象的数量而不是 Data 对象的数量运行,所以应该是 3/5 数据对象产生 60,我们得到类似于 10/12 并得到 83.33333。 We have a large number of other metrics that we are calculating using a number of other fields not shown here, so we can't use aggregate , and the use of the values("period") should have the objects be treated as a single group for the later annotations that include the Avg , and they do for every metric but this single calculation.我们有大量其他指标,我们正在使用此处未显示的许多其他字段进行计算,因此我们不能使用aggregate ,并且使用values("period")应该将对象视为单个对包括Avg的后续注释进行分组,并且它们对除此单个计算之外的每个指标都执行此操作。 We have tried having the inner_query be inside of the annotation directly, tried having that inner query have .values("do_id").distinct() at the end, tried to remove the query entirely and operate using commenter__medium_comments type filtering directly in the Data model, and I have no idea why it is returning like this.我们尝试将 inner_query 直接放在注释内部,尝试让内部查询在末尾具有.values("do_id").distinct() ,尝试完全删除查询并直接在数据中使用commenter__medium_comments类型过滤进行操作model,我不知道为什么会这样返回。 Any help would be greatly appreciated.任何帮助将不胜感激。

The annotations listed in the question are completely correct and the actual problem was with the FilterSet in the view itself.问题中列出的注释是完全正确的,实际问题出在视图本身的 FilterSet 上。 Specifically, we had a set of filters looking at creator_id like the following (related name for the DataCommenter model to Data is commenter ):具体来说,我们有一组过滤器查看creator_id ,如下所示(DataCommenter model 与 Data 的相关名称是commenter ):

class BaseAggregateFilterSet(filters.FilterSet):
    creator_id = filters.NumberFilter(method=filter_by_creator_id")

    def filter_by_creator_id(self, qs, _, value):
        return qs.filter(Q(creator_id=value) | Q(commenter__creator_id=value)).distinct()

I thought that the distinct() at the end of that query would ensure that if you have 1 Data object with a DataCommenter object with the same creator_id, or even one with a different one that matched the selection, you would get back only the singular rows for the Data objects.我认为该查询末尾的distinct()将确保如果您有 1 个数据 object 和一个具有相同 creator_id 的 DataCommenter object,或者甚至有一个与选择匹配的不同数据,您将只返回单数Data 对象的行。 What actually happens is that there is an implicit INNER JOIN done that multiplies each of your Data objects by the number DataCommenter objects.实际发生的是,有一个隐式的 INNER JOIN 完成,它将每个 Data 对象乘以 DataCommenter 对象的数量。 The correct way to do this to ensure that you maintain your QuerySet of only Data objects is to structure it like the following:执行此操作以确保维护仅包含 Data 对象的 QuerySet 的正确方法是按如下方式构建它:

def filter_by_creator_id(self, qs, _, value):
    data_ids = qs.filter(Q(creator_id=value) | Q(commenter__creator_id=value)).values("id").distinct()
    return qs.filter(id__in=data_ids)

Once we made that change, all of our annotations that were returning strange results for averages were fixed as they were now operating on the expected number of objects instead of some objects counting multiple times.一旦我们进行了更改,我们所有返回奇怪的平均值结果的注释都得到了修复,因为它们现在对预期数量的对象进行操作,而不是对某些对象进行多次计数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM