简体   繁体   中英

SQL - aggregating across multiple rows

my_table

I have the following table which has the drivers and riders details captured. For each day( datetime ) there is one driver and zero or more riders. If there are more than one rider, for each rider the data ( name of rider and age of rider) is captured in a new row with the same datetime . This may not be the right way to structure the data, but it is so primarily due to the varying number of riders per driver per datetime

id    datetime    driver   age    riders   rider_name | rider_age
---|------------|--------|------|--------|------------|---
1  | 03/03/2009 | joe    | 24   | 0      |            | 
2  | 04/03/2009 | john   | 39   | 1      | juliet     | 30
3  | 05/03/2009 | borat  | 32   | 2      | jane       | 45
4  | 05/03/2009 |        |      |        | mike       | 18
5  | 06/03/2009 | john   | 39   | 3      | duke       | 42
6  | 06/03/2009 |        |      |        | jose       | 33
7  | 06/03/2009 |        |      |        | kyle       | 24

required output

For each datetime value, need the driver, age, number of riders, name of youngest rider and number of riders within +/- 10 years of the driver

 datetime    driver   age    riders   youngest_rider  riders_within_ten_years_of_driver
------------|--------|------|--------|--------------|---
 03/03/2009 | joe    | 24   | 0      |              | 0        # no rider
 04/03/2009 | john   | 39   | 1      | juliet       | 1        # juliet
 05/03/2009 | borat  | 32   | 2      | mike         | 0        # no rider
 06/03/2009 | john   | 39   | 3      | kyle         | 2        # duke, jose

This is a very bad data structure, because the driver name is empty, so you don't have a key for aggregation. A more normalized structure is better, but sometimes we are stuck with a particular format.

You need to get the id of the driver record for each row. For this, use a correlated subquery:

select r.*,
       (select max(r2.id)
        from riders r2
        where r2.id <= r.id and r2.driver is not null
       ) as driver_id
from riders r;

Then we build on this using a join to get the driver information and conditional aggregation. For everything but the driver with the minimum age:

select datetime,
       max(case when id = driver_id then driver end) as driver,
       max(case when id = driver_id then age end) as age,
       max(case when id = driver_id then riders end) as riders,
       sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years
from (select r.*,
             (select max(r2.id)
              from riders r2
              where r2.id <= r.id and r2.driver is not null
             ) as driver_id
      from riders r
     ) r
group by datetime, driver_id;

The riders with the minimum age is quite tricky with this data structure. One solution is to use a CTE:

with r as (
      select r.*,
             (select max(r2.id)
              from riders r2
              where r2.id <= r.id and r2.driver is not null
             ) as driver_id
      from riders r
     )
select datetime,
       max(case when id = driver_id then driver end) as driver,
       max(case when id = driver_id then age end) as age,
       max(case when id = driver_id then riders end) as riders,
       sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years,
       (select r2.rider_name
        from r r2
        where r2.driver_id = r.driver_id 
        order by r2.rider_age desc
        limit 1
       ) as minimum_age_rider
from r
group by datetime, driver_id;

This is much harder than it needs to be because (1) the data structure is not very good and (2) SQLite is not particularly powerful (it doesn't support window functions, especially).

If you provide data inserts, I can try if this query works.

select datetime, driver, age, max(riders)
,max(first_value(rider_name) over (partition by datetime, driver, age order by rider_age, rider_name)) youngest_rider
, count (case when rider_age between age -10 and age + 10
        then 1
        else 0
        end
) count_riders_in_age_grp
from table 
group by datetime, driver, age

This is a terrible database structure, but I'm assuming it's a homework question. Regardless, this should work:

SELECT  [DateTime], 
        MAX(driver) AS [Driver], 
        MAX(AGE) AS [Age], 
        MAX(riders) AS [Riders],
        t.rider_name AS [Youngest Rider],
        ISNULL(SUM(CASE WHEN rider_age BETWEEN MAX(AGE)- 10 AND MAX(AGE) + 10 THEN 1 ELSE 0 END), 0) AS [Riders within Ten Years of Driver]
FROM my_table M
CROSS APPLY
    (
        SELECT rider_name
        FROM my_table
        WHERE DateTime = M.DateTime
        AND rider_age = (SELECT MIN(rider_age) FROM my_table WHERE DateTime = M.DateTime)
    ) t
GROUP BY M.DateTime, t.rider_name
SELECT
    datetime
    ,max(driver) as driver
    ,max(age) as age
    ,max(riders) as riders
    ,first_value(rider_name) OVER
        (PARTITION BY datetime
        ORDER BY rider_age
        rows unbounded preceding)
        as youngest_rider
    ,count(b.id) as riders_within_ten_years_of_driver
FROM
    my_table a
LEFT JOIN
    my_table b
    ON
        a.datetime = b.datetime
        AND a.age - b.rider_age between -10 AND 10
GROUP BY
    datetime
    ,youngest_rider

This is a mess. It would be much simpler if you had a table for drivers, riders and rides.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM