简体   繁体   English

SQL 按列分组但基于另一列分段

[英]SQL group by an column but segmented based on another column

I have this table which contain roughly more than 100000 rows and with 3 columns:我有一张包含大约 100000 多行和 3 列的表:

  • Account_number帐号
  • Report_date报告日期
  • Outstanding_amount未清金额

I need to find a statement that group the outstanding amount by account but also cut it based on the date.我需要找到一个报表,按帐户对未清金额进行分组,但还要根据日期进行削减。 Sample data for 1 account: 1 个帐户的示例数据:

+----------------+-------------+--------------------+--+
| account_number | report_date | outstanding_amount |  |
+----------------+-------------+--------------------+--+
|              1 | 02/01/2019  |                100 |  |
|              1 | 03/01/2019  |                100 |  |
|              1 | 06/01/2019  |                200 |  |
|              1 | 07/01/2019  |                300 |  |
|              1 | 10/01/2019  |                200 |  |
|              1 | 11/01/2019  |                200 |  |
|              1 | 12/01/2019  |                100 |  |
+----------------+-------------+--------------------+--+    

So if I run this statement:所以如果我运行这个语句:

select * from (select account_number, min(report_date) mindate, max(report_date) maxdate, outstading_amount from table1 grouped by account_number, outstanding_amount)

The result of this statement should be similar to this:此语句的结果应类似于:

+----------------+------------+------------+--------------------+
| account_number |  mindate   |  maxdate   | outstanding_amount |
+----------------+------------+------------+--------------------+
|              1 | 02/01/2019 | 12/01/2019 |                100 |
|              1 | 06/01/2019 | 11/01/2019 |                200 |
|              1 | 07/01/2019 | 07/01/2019 |                300 |
+----------------+------------+------------+--------------------+

So here I want to separate the result so that the days between mindate and maxdate of one row won't overlap the days in the next row.所以在这里我想将结果分开,以便一行的 mindate 和 maxdate 之间的天数不会与下一行的天数重叠。 The result I'm looking is something like this:我正在寻找的结果是这样的:

+----------------+------------+------------+--------------------+
| account_number |  mindate   |  maxdate   | outstanding_amount |
+----------------+------------+------------+--------------------+
|              1 | 02/01/2019 | 03/01/2019 |                100 |
|              1 | 06/01/2019 | 06/01/2019 |                200 |
|              1 | 07/01/2019 | 07/01/2019 |                300 |
|              1 | 10/01/2019 | 11/01/2019 |                200 |
|              1 | 12/01/2019 | 12/01/2019 |                100 |
+----------------+------------+------------+--------------------+

Is it possible to construct this statement?是否可以构造此语句?

To flatten the data, squish it by calculated rank.要扁平化数据,请按计算的等级压缩它。

select account_number
, min(report_date) as mindate
, max(report_date) as maxdate
, outstanding_amount
from
(
    select q1.*
    , sum(flag) over (partition by account_number order by report_date) as rnk
    from
    (
        select t.*
        , case when outstanding_amount = lag(outstanding_amount, 1) over (partition by account_number order by report_date) then 0 else 1 end as flag
        from table1 t
    ) q1
) q2
group by account_number, outstanding_amount, rnk
order by account_number, mindate;

A test on db<>fiddle heredb<>fiddle 的测试在这里

This is a gaps-and-islands problem.这是一个缺口和孤岛问题。 In this case, the simplest solution is probably the difference of row numbers:在这种情况下,最简单的解决方案可能是行号的差异:

select account_number, outstanding_amount,
       min(report_date), max(report_date)
from (select t.*,
             row_number() over (partition by account_number order by report_date) as seqnum,
             row_number() over (partition by account_number, outstanding_amount order by report_date) as seqnum_o
      from t
     ) t
group by account_number, outstanding_amount, (seqnum - seqnum_o)
order by account_number, min(report_date);

Why this works is a little tricky to explain.为什么这行得通有点难以解释。 But if you look at the results of the subquery, you will be able to see how the difference of row numbers defines the adjacent rows with the same amount.但是如果您查看子查询的结果,您将能够看到行号的差异如何定义具有相同数量的相邻行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM