简体   繁体   English

按两列分组,一列分开,按计数排序

[英]Group by two columns, distinct by one, and order by the count

Struggled quite allot with the title of the question :) 与题名苦苦挣扎:)

I'm a beginner in python & django and i have a query i'm trying to make 我是python&django的初学者,我想查询一个查询

My (simplify) model is: users, trips, countries. 我的(简化)模型是:用户,旅行,国家。

User can create many trips he wants with whatever country he wants. 用户可以在任意国家/地区创建自己想要的许多旅行。 He can create multiple trips to the same country as well. 他也可以创建多次前往同一国家的旅行。

My goal is to fetch the top 15 countries with the most trips created by different users + the count. 我的目标是获取由不同用户创造的旅行次数最多的前15个国家/地区。 Means that if one user created 10 trips to the same country it considers as one. 意味着如果一个用户创建了10次前往同一国家的旅行,则视为一次旅行。

What I've achieved so far is 到目前为止,我取得的成就是

    hottest_countries = models.Event.objects.values('country')\
                      .exclude(creator=None) \
                      .annotate(count=Count('country'))\
                      .distinct() \
                      .order_by('-count')[:15]

this will return the countries and the count for each country but not by different users . 这将返回国家和每个国家的数量, 但不会返回不同的用户

So I've changed my code to this 所以我将代码更改为此

    hottest_countries = models.Event.objects.values_list('country', flat=True)
                      .exclude(creator=None) \
                      .annotate(count=Count('country'))\
                      .distinct() \
                      .order_by('-count')[:15]

    # Getting all the creators of each country
    creators_for_country = [models.Event.objects.values_list('creator', flat=True).filter(Q(country=country_id)).distinct() for country_id in hottest_countries]

    # Sorting again to make sure
    hots_events_sorted = [{"country_id": country_id, "count": len(creators_for_country[idx]), "creators": creators_for_country[idx]} for idx, country_id in enumerate(hottest_countries)]
    hots_events_sorted.sort(key=itemgetter('count'), reverse=True)

It is working, but: 它正在工作,但是:

A. I think it is to complicated. 答:我认为这很复杂。 and must be easier way. 并且必须是更简单的方法。

B. Can be that the top 15 countries i have fetched in the first query are not really the right ones because might be that the second query reduces allot of entries when distinct by the creator. B.可能是我在第一个查询中获取的前15个国家并不是真正正确的国家,因为可能是第二个查询在创建者区分时减少了条目分配。 For ex. 对于前。 One user created 1000 trips to Canada. 一位用户创造了1000次加拿大旅行。 this pushes the country in the first query to top of the list. 这会将第一个查询中的国家/地区推到列表的顶部。 but when we distinct the list by creators we get one entry. 但是当我们按创建者区分列表时,我们会得到一个条目。 which makes Canada down in the list or even not at all. 这使得加拿大名列前茅,甚至根本没有。

Note: When I tried to distinct with a given columns i got DB error, that my db is not support distinct by columns.. 注意:当我尝试与给定的列进行区分时,出现数据库错误,即我的数据库不支持按列区分。

In case anyone has struggled as me, here is my solution. 万一有人像我一样挣扎,这就是我的解决方案。

Adding distinct=True in the annotate solve my problem 在注释中添加distinct=True解决了我的问题

hottest_countries = models.Event.objects.values('country')\
                  .exclude(creator=None) \
                  .annotate(count=Count('creator', distinct=True))\
                  .distinct() \
                  .order_by('-count')[:15]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 dataframe 按两列分组,然后根据其中一组查找平均计数 - Group dataframe by two columns and then find average count based on one of the groups Pandas dataframe 在其他列中找到每个组的不同值计数 - Pandas dataframe find distinct value count for each group in other columns Python Pandas:对所有列进行分组并计算不同的价值? - Python Pandas: Group by and count distinct value over all columns? 如何对不同的行进行分组并将类别计数添加到 pandas 的新列中? - How to group distinct rows and add count of a category into new columns in pandas? 按django分组,不同 - Group by, distinct, count in django 按两列对数据进行分组,并使用pandas对其进行计数 - Group data by two columns and count it using pandas 基于两列的计数频率,没有分组依据 - count frequency based in two columns without group by 如何使用 approx_count_distinct 计算 Spark DataFrame 中两列的不同组合? - How to use approx_count_distinct to count distinct combinations of two columns in a Spark DataFrame? Pandas:计算两列的不同组合并添加到Same Dataframe - Pandas: Count Distinct Combinations of two columns and add to Same Dataframe Pandas 按两列分组,并按每组计算第二列值 - Pandas group by two columns and count the second column value by each group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM