Check if object_id occurs more than once in queryset.annotate Case When parameter

Question

Documentation field look up doesn't really help in my case

What my query looks like now

date_delta = 2

queryset = TrendData.objects.filter(owner__trend_type__mnemonic='posts', 
 date_trend__date__range=[date_from, date_to]).values('owner_id', 'owner__name')

queryset.annotate(owner_name=F('owner_id__name')).values('owner_name', 'owner_id').annotate(
    views = Sum(Case(When(owner_id__gt=1, then=F('views') / date_delta)), default=('views')...,
                output_field=IntegerField() )
)

the queryset output looks like this:

{'owner_id': 1306, 'owner__name': 'Some name123'}, 
{'owner_id': 1307, 'owner__name': 'Somename as well'}, 
{'owner_id': 1308, 'owner__name': 'aand another name'}, 
{'owner_id': 1306, 'owner__name': 'Some name123'}

as you can see there are matching owner_id's and the queryset len() is 100k per day, so if range of dates is 5 days queryset len() == 500k. my models.py look like this

class Owner(models.Model):
    class Meta:
        verbose_name_plural = 'Objects'

    TREND_OWNERS = Choices('group', 'user')

    link = models.CharField(max_length=255)
    name = models.CharField(max_length=255)
    owner_type = models.CharField(choices=TREND_OWNERS, max_length=50)
    trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)

    def __str__(self):
        return f'{self.link}[{self.trend_type}]'


class TrendData(models.Model):
    class Meta:
        verbose_name_plural = 'Trends'

    owner = models.ForeignKey(Owner, on_delete=models.CASCADE)
    views = models.IntegerField()
    views_u = models.IntegerField()
    likes = models.IntegerField()
    shares = models.IntegerField()
    interaction_rate = models.DecimalField(max_digits=20, decimal_places=10)
    mean_age = models.IntegerField()
    date_trend = models.DateTimeField()

I realised that it will work fine, but it will be wrong, since if owner_id is great it will divide by date_delta, where in my case I want if owner_id occurence in queryset more than once. I have tried owner_id__count__gt but that doesnt exist :(

I would love to know if there is a way to count owner_id occurence in my annotate Case(When()) queryset. that will literally solve my problem. if it's greater than 1 than we divide by date_delta, else we leave it as it is

Update:

Just to be clear, this annotation does an excellent job, however it also divides some queries that i don't want to be divided (in my case NON duplicate owner_id queryset still divides it's views, shares etc by 2) so that is why I use Case(When()) mentioned above

queryset.values('owner__name', 'owner_id').annotate(
    views=Sum('views') / 2, 
    views_u=Sum('views_u') / 2, 
    likes=Sum('likes') / 2,
    shares=Sum('shares') / 2, 
    interaction_rate=Sum('interaction_rate') / 2,
    mean_age=Sum('mean_age') / 2)

UPDATE #2 This is my logic but in python

json_output = []
for item in (queryset
                .values('owner__name', 'owner_id')
                .annotate(owner_count=Count('owner_id'))
                .annotate(views=Sum('views'), views_u=Sum('views_u'),
                            likes=Sum('likes'),
                            shares=Sum('shares'),
                            interaction_rate=Sum('interaction_rate'),
                            mean_age=Sum('mean_age')):
    if item['owner_count'] > 1:
        item['views'] = item['views'] / date_delta
        item['views_u'] = item['views_u'] / date_delta
        item['likes'] = item['likes'] / date_delta
        item['shares'] = item['shares'] / date_delta
        item['interaction_rate'] = '{:.10f}'.format(
            Decimal(item['interaction_rate']) / date_delta)
        item['mean_age'] = item['mean_age'] / date_delta
        json_output.append(item)
    else:
        json_output.append(item)

Answer 1

First, I think this is wrong owner_name=F('owner_id__name' it hsould be owner_name=F('owner__name' .

If I understood, you want to annotate TrendData queryset with the amount of TrendData instances that have the owner.

You can use a Subquery to achieving that:

owner_td_count = Owner.objects.annotate(
    td_count=Count('trenddata_set')
).filter(
    id=OuterRef('owner_id')
).values('td_count')[:1]

Then annotate first by counting occurrences of owner_id:

queryset.annotate(
    owner_name=F('owner__name'),
    owner_id_count=Subquery(owner_td_count)   # How many DataTrend's have the owner with id=owner_id
    ).values('owner_name', 'owner_id').annotate(
        # ...
    )
)

Then you could in you Case/when construction:

Case(
    When(
        owner_id_count=1, then=F('views'), 
        default=F('views') / date_delta)),
        output_field=IntegerField() 
    )
)

Answer 2

Update: Turns out that I hadn't tested this fully after all (I thought I had, apologies). You need to have the Case wrapped around Sum , the other way around ( Sum around Case ) won't work no matter the Django version:

(queryset
    .values('owner', owner_name=F('owner__name'))
    .annotate(owner_count=Count('owner'))
    .annotate(views = Case(
        When(owner_count__gt=1,
             then=Sum(F('views') / date_delta)),
        default=Sum('views'),
        output_field=IntegerField()
    ))
)

A slight variation would be to use a subquery. Raydel's subquery that calculates the Trenddata count for every Owner works in principle, but will be prohibitively slow as it does an aggregation for every single row in Trenddata (not just for unique Owner s).

A different subquery provides a faster way of getting the same result. It does the heavy lifting of counting Owner s in Trenddata only once and then checks for every Trenddata object if its owner is in the list. I would think this should still be slower than my first query, but strangely enough, it came out on par in my short tests (with around 3m rows).

(queryset
    .values('owner', owner_name=F('owner__name'))
    .annotate(multi=Case(
        When(owner__in=Subquery(TrendData.objects
                                    .values('owner')
                                    .annotate(cnt=Count('owner'))
                                    .filter(cnt__gt=0)
                                    .values('owner')), 
             then=1),
        default=0,
        output_field=IntegerField())
    ) 
    .annotate(views = Case(
        When(multi=1,
             then=Sum(F('views') / date_delta)),
        default=Sum('views'),
        output_field=IntegerField())
    )
)

You can wrap the two annotations in one, but if you're reusing multi for several more annotations rather than just one as in my example, separating the two saves you from repeating the subquery for every annotation.

Check if object_id occurs more than once in queryset.annotate Case When parameter

Question

2 answers

solution1
1 2019-04-17 14:07:13

solution2
1 ACCPTED 2019-04-17 14:17:27

Check if object_id occurs more than once in queryset.annotate Case When parameter

Question

2 answers

solution1 1 2019-04-17 14:07:13

solution2 1 ACCPTED 2019-04-17 14:17:27

solution1
1 2019-04-17 14:07:13

solution2
1 ACCPTED 2019-04-17 14:17:27