简体   繁体   English

SQL-跨多行聚合

[英]SQL - aggregating across multiple rows

my_table my_table

I have the following table which has the drivers and riders details captured. 我有下表,其中记录了驾驶员和骑手的详细信息。 For each day( datetime ) there is one driver and zero or more riders. 对于每一天( datetime ),都有一名驾驶员和零个或多个骑手。 If there are more than one rider, for each rider the data ( name of rider and age of rider) is captured in a new row with the same datetime . 如果有一个以上的骑手,则每个骑手的数据(骑手的姓名和骑手的年龄)将在具有相同datetime的新行中捕获。 This may not be the right way to structure the data, but it is so primarily due to the varying number of riders per driver per datetime 这可能不是正确的数据结构方式,但之所以如此,主要是因为每个日期时间每个驾驶员的骑手人数有所变化

id    datetime    driver   age    riders   rider_name | rider_age
---|------------|--------|------|--------|------------|---
1  | 03/03/2009 | joe    | 24   | 0      |            | 
2  | 04/03/2009 | john   | 39   | 1      | juliet     | 30
3  | 05/03/2009 | borat  | 32   | 2      | jane       | 45
4  | 05/03/2009 |        |      |        | mike       | 18
5  | 06/03/2009 | john   | 39   | 3      | duke       | 42
6  | 06/03/2009 |        |      |        | jose       | 33
7  | 06/03/2009 |        |      |        | kyle       | 24

required output 所需的输出

For each datetime value, need the driver, age, number of riders, name of youngest rider and number of riders within +/- 10 years of the driver 对于每个日期时间值,需要驾驶员,年龄,驾驶员人数,最小的驾驶员姓名以及驾驶员+/- 10年内的驾驶员人数

 datetime    driver   age    riders   youngest_rider  riders_within_ten_years_of_driver
------------|--------|------|--------|--------------|---
 03/03/2009 | joe    | 24   | 0      |              | 0        # no rider
 04/03/2009 | john   | 39   | 1      | juliet       | 1        # juliet
 05/03/2009 | borat  | 32   | 2      | mike         | 0        # no rider
 06/03/2009 | john   | 39   | 3      | kyle         | 2        # duke, jose

This is a very bad data structure, because the driver name is empty, so you don't have a key for aggregation. 这是一个非常糟糕的数据结构,因为驱动程序名称为空,因此您没有用于聚合的键。 A more normalized structure is better, but sometimes we are stuck with a particular format. 标准化程度更高的结构更好,但有时我们会陷入一种特殊的格式。

You need to get the id of the driver record for each row. 您需要获取每一行的驱动程序记录的ID。 For this, use a correlated subquery: 为此,请使用相关子查询:

select r.*,
       (select max(r2.id)
        from riders r2
        where r2.id <= r.id and r2.driver is not null
       ) as driver_id
from riders r;

Then we build on this using a join to get the driver information and conditional aggregation. 然后,我们建立在这个使用join ,以获得驱动程序信息和有条件的聚集。 For everything but the driver with the minimum age: 对于除年龄最小的驾驶员以外的所有驾驶员:

select datetime,
       max(case when id = driver_id then driver end) as driver,
       max(case when id = driver_id then age end) as age,
       max(case when id = driver_id then riders end) as riders,
       sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years
from (select r.*,
             (select max(r2.id)
              from riders r2
              where r2.id <= r.id and r2.driver is not null
             ) as driver_id
      from riders r
     ) r
group by datetime, driver_id;

The riders with the minimum age is quite tricky with this data structure. 年龄最小的骑手使用此数据结构非常棘手。 One solution is to use a CTE: 一种解决方案是使用CTE:

with r as (
      select r.*,
             (select max(r2.id)
              from riders r2
              where r2.id <= r.id and r2.driver is not null
             ) as driver_id
      from riders r
     )
select datetime,
       max(case when id = driver_id then driver end) as driver,
       max(case when id = driver_id then age end) as age,
       max(case when id = driver_id then riders end) as riders,
       sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years,
       (select r2.rider_name
        from r r2
        where r2.driver_id = r.driver_id 
        order by r2.rider_age desc
        limit 1
       ) as minimum_age_rider
from r
group by datetime, driver_id;

This is much harder than it needs to be because (1) the data structure is not very good and (2) SQLite is not particularly powerful (it doesn't support window functions, especially). 这比需要做的难得多,因为(1)数据结构不是很好,并且(2)SQLite并不是特别强大(特别是它不支持窗口函数)。

If you provide data inserts, I can try if this query works. 如果您提供数据插入,我可以尝试此查询是否有效。

select datetime, driver, age, max(riders)
,max(first_value(rider_name) over (partition by datetime, driver, age order by rider_age, rider_name)) youngest_rider
, count (case when rider_age between age -10 and age + 10
        then 1
        else 0
        end
) count_riders_in_age_grp
from table 
group by datetime, driver, age

This is a terrible database structure, but I'm assuming it's a homework question. 这是一个糟糕的数据库结构,但是我假设这是一个家庭作业问题。 Regardless, this should work: 无论如何,这应该起作用:

SELECT  [DateTime], 
        MAX(driver) AS [Driver], 
        MAX(AGE) AS [Age], 
        MAX(riders) AS [Riders],
        t.rider_name AS [Youngest Rider],
        ISNULL(SUM(CASE WHEN rider_age BETWEEN MAX(AGE)- 10 AND MAX(AGE) + 10 THEN 1 ELSE 0 END), 0) AS [Riders within Ten Years of Driver]
FROM my_table M
CROSS APPLY
    (
        SELECT rider_name
        FROM my_table
        WHERE DateTime = M.DateTime
        AND rider_age = (SELECT MIN(rider_age) FROM my_table WHERE DateTime = M.DateTime)
    ) t
GROUP BY M.DateTime, t.rider_name
SELECT
    datetime
    ,max(driver) as driver
    ,max(age) as age
    ,max(riders) as riders
    ,first_value(rider_name) OVER
        (PARTITION BY datetime
        ORDER BY rider_age
        rows unbounded preceding)
        as youngest_rider
    ,count(b.id) as riders_within_ten_years_of_driver
FROM
    my_table a
LEFT JOIN
    my_table b
    ON
        a.datetime = b.datetime
        AND a.age - b.rider_age between -10 AND 10
GROUP BY
    datetime
    ,youngest_rider

This is a mess. 真是一团糟。 It would be much simpler if you had a table for drivers, riders and rides. 如果您有一张供驾驶员,骑手和乘员使用的桌子,那会简单得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM