简体   繁体   中英

Group by two columns, distinct by one, and order by the count

Struggled quite allot with the title of the question :)

I'm a beginner in python & django and i have a query i'm trying to make

My (simplify) model is: users, trips, countries.

User can create many trips he wants with whatever country he wants. He can create multiple trips to the same country as well.

My goal is to fetch the top 15 countries with the most trips created by different users + the count. Means that if one user created 10 trips to the same country it considers as one.

What I've achieved so far is

    hottest_countries = models.Event.objects.values('country')\
                      .exclude(creator=None) \
                      .annotate(count=Count('country'))\
                      .distinct() \
                      .order_by('-count')[:15]

this will return the countries and the count for each country but not by different users .

So I've changed my code to this

    hottest_countries = models.Event.objects.values_list('country', flat=True)
                      .exclude(creator=None) \
                      .annotate(count=Count('country'))\
                      .distinct() \
                      .order_by('-count')[:15]

    # Getting all the creators of each country
    creators_for_country = [models.Event.objects.values_list('creator', flat=True).filter(Q(country=country_id)).distinct() for country_id in hottest_countries]

    # Sorting again to make sure
    hots_events_sorted = [{"country_id": country_id, "count": len(creators_for_country[idx]), "creators": creators_for_country[idx]} for idx, country_id in enumerate(hottest_countries)]
    hots_events_sorted.sort(key=itemgetter('count'), reverse=True)

It is working, but:

A. I think it is to complicated. and must be easier way.

B. Can be that the top 15 countries i have fetched in the first query are not really the right ones because might be that the second query reduces allot of entries when distinct by the creator. For ex. One user created 1000 trips to Canada. this pushes the country in the first query to top of the list. but when we distinct the list by creators we get one entry. which makes Canada down in the list or even not at all.

Note: When I tried to distinct with a given columns i got DB error, that my db is not support distinct by columns..

In case anyone has struggled as me, here is my solution.

Adding distinct=True in the annotate solve my problem

hottest_countries = models.Event.objects.values('country')\
                  .exclude(creator=None) \
                  .annotate(count=Count('creator', distinct=True))\
                  .distinct() \
                  .order_by('-count')[:15]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM