简体   繁体   中英

Limit results of each group

I want to limit the records in each group, so that when I aggregate them into a JSON object in the select-statement, it only takes the N conversations with hightest count

Any ideas?

My query:

select
          dt.id as app_id,
          json_build_object(
              'rows', array_agg(
                 json_build_object(
                    'url', dt.started_at_url,
                    'count', dt.count
                 )
              )
          ) as data
      from (
          select a.id, c.started_at_url, count(c.id)
          from apps a
          left join conversations c on c.app_id = a.id
          where started_at_url is not null and c.started_at::date > (current_date - (7  || ' days')::interval)::date
          group by a.id, c.started_at_url
          order by count desc
      ) as dt
      where dt.id = 'ASnYW1-RgCl0I'
      group by dt.id

Your problem is similar to a groupwise-max problem, and there are many solutions to this.

Filtering row_number window function

A simple one is to use row_number() window function and filter out for only rows that the result is < N (using 5 as an example):

select
          dt.id as app_id,
          json_build_object(
              'rows', array_agg(
                 json_build_object(
                    'url', dt.started_at_url,
                    'count', dt.count
                 )
              )
          ) as data
      from (
          select
              a.id, c.started_at_url,
              count(c.id) as count,
              row_number() over(partition by a.id order by count(c.id) desc) as rn
          from apps a
          left join conversations c on c.app_id = a.id
          where started_at_url is not null and c.started_at > (current_date - (7  || ' days')::interval)::date
          group by a.id, c.started_at_url
          order by count desc
      ) as dt
      where
          dt.id = 'ASnYW1-RgCl0I'
          and dt.rn <= 5 /* get top 5 only */
      group by dt.id

Using LATERAL

Another option is to use LATERAL and LIMIT to bring back only the results you are interested in:

select
    a.id as app_id,
    json_build_object(
        'rows', array_agg(
           json_build_object(
              'url', dt.started_at_url,
              'count', dt.count
           )
        )
    ) as data
form
    apps a, lateral(
        select
            c.started_at_url,
            count(*) as count
        from
            conversations c
        where
            c.app_id = a.id /* here is why lateral is necessary */
            and c.started_at_url is not null
            and c.started_at > (current_date - (7  || ' days')::interval)::date
        group by
            c.started_at_url
        order by
            count(*) desc
        limit 5 /* get top 5 only */
    ) as dt
where
    a.id = 'ASnYW1-RgCl0I'
group by
    a.id

OBS: I haven't tried those, so there may be a typo. You can provide sample data sets if you wish some testing.

OBS 2: If you are really filtering by app_id on your final query, then you don't even need that GROUP BY clause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM