简体   繁体   English

检查 object_id 是否在 queryset.annotate Case When 参数中出现多次

[英]Check if object_id occurs more than once in queryset.annotate Case When parameter

Documentation field look up doesn't really help in my case在我的情况下,文档字段查找并没有真正帮助

What my query looks like now我的查询现在是什么样子

date_delta = 2

queryset = TrendData.objects.filter(owner__trend_type__mnemonic='posts', 
 date_trend__date__range=[date_from, date_to]).values('owner_id', 'owner__name')

queryset.annotate(owner_name=F('owner_id__name')).values('owner_name', 'owner_id').annotate(
    views = Sum(Case(When(owner_id__gt=1, then=F('views') / date_delta)), default=('views')...,
                output_field=IntegerField() )
)

the queryset output looks like this:queryset输出如下所示:

{'owner_id': 1306, 'owner__name': 'Some name123'}, 
{'owner_id': 1307, 'owner__name': 'Somename as well'}, 
{'owner_id': 1308, 'owner__name': 'aand another name'}, 
{'owner_id': 1306, 'owner__name': 'Some name123'}

as you can see there are matching owner_id's and the queryset len() is 100k per day, so if range of dates is 5 days queryset len() == 500k.如您所见,存在匹配的 owner_id,并且查询集 len() 每天 100k,因此如果日期范围是 5 天,则查询集 len() == 500k。 my models.py look like this我的models.py 看起来像这样

class Owner(models.Model):
    class Meta:
        verbose_name_plural = 'Objects'

    TREND_OWNERS = Choices('group', 'user')

    link = models.CharField(max_length=255)
    name = models.CharField(max_length=255)
    owner_type = models.CharField(choices=TREND_OWNERS, max_length=50)
    trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)

    def __str__(self):
        return f'{self.link}[{self.trend_type}]'


class TrendData(models.Model):
    class Meta:
        verbose_name_plural = 'Trends'

    owner = models.ForeignKey(Owner, on_delete=models.CASCADE)
    views = models.IntegerField()
    views_u = models.IntegerField()
    likes = models.IntegerField()
    shares = models.IntegerField()
    interaction_rate = models.DecimalField(max_digits=20, decimal_places=10)
    mean_age = models.IntegerField()
    date_trend = models.DateTimeField()

I realised that it will work fine, but it will be wrong, since if owner_id is great it will divide by date_delta, where in my case I want if owner_id occurence in queryset more than once.我意识到它可以正常工作,但它会出错,因为如果 owner_id 很好,它将除以 date_delta,在我的情况下,我想要如果 owner_id 在查询集中出现不止一次。 I have tried owner_id__count__gt but that doesnt exist :(我试过owner_id__count__gt但那不存在:(

I would love to know if there is a way to count owner_id occurence in my annotate Case(When()) queryset.我很想知道是否有办法计算我的注释Case(When())查询集中的 owner_id 出现次数。 that will literally solve my problem.这将真正解决我的问题。 if it's greater than 1 than we divide by date_delta, else we leave it as it is如果它大于 1 比我们除以 date_delta,否则我们保持原样

Update:更新:

Just to be clear, this annotation does an excellent job, however it also divides some queries that i don't want to be divided (in my case NON duplicate owner_id queryset still divides it's views, shares etc by 2) so that is why I use Case(When()) mentioned above需要明确的是,这个注释做得很好,但是它也划分了一些我不想划分的查询(在我的情况下,非重复的 owner_id 查询集仍然将它的视图、共享等除以 2)所以这就是为什么我上面提到的用例(When())

queryset.values('owner__name', 'owner_id').annotate(
    views=Sum('views') / 2, 
    views_u=Sum('views_u') / 2, 
    likes=Sum('likes') / 2,
    shares=Sum('shares') / 2, 
    interaction_rate=Sum('interaction_rate') / 2,
    mean_age=Sum('mean_age') / 2)

UPDATE #2 This is my logic but in python更新 #2这是我的逻辑,但在 python 中

json_output = []
for item in (queryset
                .values('owner__name', 'owner_id')
                .annotate(owner_count=Count('owner_id'))
                .annotate(views=Sum('views'), views_u=Sum('views_u'),
                            likes=Sum('likes'),
                            shares=Sum('shares'),
                            interaction_rate=Sum('interaction_rate'),
                            mean_age=Sum('mean_age')):
    if item['owner_count'] > 1:
        item['views'] = item['views'] / date_delta
        item['views_u'] = item['views_u'] / date_delta
        item['likes'] = item['likes'] / date_delta
        item['shares'] = item['shares'] / date_delta
        item['interaction_rate'] = '{:.10f}'.format(
            Decimal(item['interaction_rate']) / date_delta)
        item['mean_age'] = item['mean_age'] / date_delta
        json_output.append(item)
    else:
        json_output.append(item)

First, I think this is wrong owner_name=F('owner_id__name' it hsould be owner_name=F('owner__name' .首先,我认为这是错误的owner_name=F('owner_id__name'它应该是owner_name=F('owner__name'

If I understood, you want to annotate TrendData queryset with the amount of TrendData instances that have the owner.如果我理解,您想用拥有所有者的 TrendData 实例的数量来注释 TrendData 查询集。

You can use a Subquery to achieving that:您可以使用子查询来实现:

owner_td_count = Owner.objects.annotate(
    td_count=Count('trenddata_set')
).filter(
    id=OuterRef('owner_id')
).values('td_count')[:1]

Then annotate first by counting occurrences of owner_id:然后首先通过计算 owner_id 的出现来注释:

queryset.annotate(
    owner_name=F('owner__name'),
    owner_id_count=Subquery(owner_td_count)   # How many DataTrend's have the owner with id=owner_id
    ).values('owner_name', 'owner_id').annotate(
        # ...
    )
)

Then you could in you Case/when construction:然后你可以在你的情况下/建造时:

Case(
    When(
        owner_id_count=1, then=F('views'), 
        default=F('views') / date_delta)),
        output_field=IntegerField() 
    )
)

Update: Turns out that I hadn't tested this fully after all (I thought I had, apologies).更新:事实证明我毕竟没有完全测试过(我以为我有,抱歉)。 You need to have the Case wrapped around Sum , the other way around ( Sum around Case ) won't work no matter the Django version:您需要将Case包裹在Sum周围, SumSum围绕Case )将不起作用,无论 Django 版本如何:

(queryset
    .values('owner', owner_name=F('owner__name'))
    .annotate(owner_count=Count('owner'))
    .annotate(views = Case(
        When(owner_count__gt=1,
             then=Sum(F('views') / date_delta)),
        default=Sum('views'),
        output_field=IntegerField()
    ))
)

A slight variation would be to use a subquery.一个细微的变化是使用子查询。 Raydel's subquery that calculates the Trenddata count for every Owner works in principle, but will be prohibitively slow as it does an aggregation for every single row in Trenddata (not just for unique Owner s). Raydel 为每个Owner计算Trenddata计数的子查询在原则上是有效的,但由于它对Trenddata每一行(不仅仅是唯一的Owner s)进行聚合,因此速度会非常慢。

A different subquery provides a faster way of getting the same result.不同的子查询提供了获得相同结果的更快方法。 It does the heavy lifting of counting Owner s in Trenddata only once and then checks for every Trenddata object if its owner is in the list.它只对Trenddata Owner进行一次计数,然后检查每个Trenddata对象是否其所有者在列表中。 I would think this should still be slower than my first query, but strangely enough, it came out on par in my short tests (with around 3m rows).我认为这应该仍然比我的第一个查询慢,但奇怪的是,它在我的简短测试中表现得相当(大约 300 万行)。

(queryset
    .values('owner', owner_name=F('owner__name'))
    .annotate(multi=Case(
        When(owner__in=Subquery(TrendData.objects
                                    .values('owner')
                                    .annotate(cnt=Count('owner'))
                                    .filter(cnt__gt=0)
                                    .values('owner')), 
             then=1),
        default=0,
        output_field=IntegerField())
    ) 
    .annotate(views = Case(
        When(multi=1,
             then=Sum(F('views') / date_delta)),
        default=Sum('views'),
        output_field=IntegerField())
    )
)

You can wrap the two annotations in one, but if you're reusing multi for several more annotations rather than just one as in my example, separating the two saves you from repeating the subquery for every annotation.您可以将两个注释合二为一,但是如果您要为多个注释重用multi而不是像我的示例中那样只使用一个注释,则将两者分开可以避免为每个注释重复子查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM