Oracle 按连续值查询分组并获取开始日期和结束日期

Question

I have a table like this (actually, is the result of a large large query):我有一个这样的表（实际上是大型查询的结果）：

id   |  date_measured        |  out_of_range
-----+-----------------------+--------------
3147 |  09/08/2019 20.00:00  |  1
3147 |  09/08/2019 21.00:00  |  0
3147 |  09/08/2019 22.00:00  |  0
3147 |  09/08/2019 23.00:00  |  1
3147 |  10/08/2019 00.00:00  |  1
3147 |  10/08/2019 01.00:00  |  1
3147 |  10/08/2019 02.00:00  |  0
3125 |  09/08/2019 20.00:00  |  0
3125 |  09/08/2019 21.00:00  |  1
3125 |  09/08/2019 22.00:00  |  1
3125 |  09/08/2019 23.00:00  |  0
3125 |  10/08/2019 00.00:00  |  1
3125 |  10/08/2019 01.00:00  |  1
3125 |  10/08/2019 02.00:00  |  1

and I need this result:我需要这个结果：

id   |  date_measured_start  |  date_measured_end    |  consecutive_out_of_range
-----+-----------------------+-----------------------+--------------------------
3147 |  09/08/2019 20.00:00  |  09/08/2019 20.00:00  |  1
3147 |  09/08/2019 23.00:00  |  10/08/2019 01.00:00  |  3
3125 |  09/08/2019 21.00:00  |  09/08/2019 22.00:00  |  2
3125 |  10/08/2019 00.00:00  |  10/08/2019 02.00:00  |  3

that is the consecutive recurrence of the value out_of_range = 1 and the relative start and end date.这是值out_of_range = 1和相对开始和结束日期的连续重复。

I tried to use this solution but I just can't have only the consecutive 1 for the out_of_range .我尝试使用此解决方案，但我不能只有连续的1用于out_of_range 。 value.价值。

Answer 1

Here is a different application of the same method as in MT0's answer.这是与 MT0 答案中相同方法的不同应用。 The method is known as the "fixed differences" method (the "fixed differences", in both solutions, are the additional, computed value by which we group the data);该方法被称为“固定差异”方法（两种解决方案中的“固定差异”是我们对数据进行分组的附加计算值）； also known as the "tabibitosan" method.也称为“tabibitosan”方法。

In this solution I subtract a row_number() (appropriately modified) directly from the date, but after selecting just the rows with the flag equal to 1. This may be important if you have a very large amount of data, but only a relatively small fraction of rows have the flag equal to 1. This is because row_number() needs to order the data, and ordering is an expensive operation.在这个解决方案中，我直接从日期中减去row_number() （适当修改），但是在选择标志等于 1 的行之后。如果您有大量数据，这可能很重要，但只有相对较小一小部分行的标志等于 1。这是因为row_number()需要对数据进行排序，并且排序是一项昂贵的操作。 To solve the problem, we don't need to order (by date) the rows where the flag is 0 - only the rows where the flag is 1.为了解决这个问题，我们不需要（按日期）对标志为 0 的行进行排序 - 只需对标志为 1 的行进行排序。

EDIT (based on MT0's comment below this answer)编辑（基于 MT0 在此答案下方的评论）

MT0 points out, correctly, that my solution assumes something that is true in the test data posted by the OP, but not stated explicitly. MT0 正确地指出，我的解决方案假定 OP 发布的测试数据中的某些内容是正确的，但没有明确说明。 Namely, that the date-times in the date_measured column are continuous sequences of date-time, spaced at one hour intervals.也就是说， date_measured列中的日期时间是日期时间的连续序列，间隔为一小时。

In fact, what my solution really does is this.事实上，我的解决方案真正做的是这个。 Suppose that from the very beginning the data consisted only of the out-of-range rows (with flag equal to 1), and that the date-times in the date_measured column were always rounded to the hour, as they are in the OP's test data.假设从一开始数据只包含超出范围的行（标志等于 1），并且date_measured列中的日期时间总是四舍五入到小时，因为它们在 OP 的测试中数据。 The question, then, would be to identify the sequences of rows where the times are "consecutive" (meaning one hour apart).那么，问题将是识别时间“连续”（意味着相隔一小时）的行序列。 That's what the query does.这就是查询的作用。

END EDIT结束编辑

I used MT0's table - from his db fiddle test.我使用了 MT0 的表格——来自他的 db fiddle 测试。 Thanks MT0!感谢MT0！

with
  tabibitosan (id, date_measured, grp) as (
    select id, date_measured,
           date_measured 
           - row_number() over (partition by id order by date_measured) 
             * interval '1' hour
    from   table_name
    where  out_of_range = 1    
  )
select id, min(date_measured) as date_measured_start, 
           max(date_measured) as date_measured_end,
           count(*)           as consecutive_out_of_range
from   tabibitosan
group  by id, grp
order  by id, date_measured_start    --  or whatever
;

  ID DATE_MEASURED_START DATE_MEASURED_END CONSECUTIVE_OUT_OF_RANGE
---- ------------------- ----------------- ------------------------
3125 2019-08-09 21:00    2019-08-09 22:00                         2
3125 2019-08-10 00:00    2019-08-10 02:00                         3
3147 2019-08-09 20:00    2019-08-09 20:00                         1
3147 2019-08-09 23:00    2019-08-10 01:00                         3

Answer 2

Use the ROW_NUMBER analytic function if give each row two incrementing numeric values - one per id and the other per id / out_of_range pair.如果给每一行两个递增的数值 - 每个id一个，另一个每个id / out_of_range对，则使用ROW_NUMBER分析 function。 If you subtract one from the other then the resulting number will be constant within a consecutive set of rows with the same id / out_of_range values and you can use this to GROUP BY :如果您从另一个中减去一个，则结果数字将在具有相同id / out_of_range值的一组连续行中保持不变，您可以将其用于GROUP BY ：

Query :查询：

SELECT id,
       MIN( date_measured ) AS date_measured_start,
       MAX( date_measured ) AS date_measured_end,
       COUNT( * ) AS consecutive_out_of_range
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY date_measured )
           - ROW_NUMBER() OVER ( PARTITION BY id, out_of_range ORDER BY date_measured )
           AS rn
  FROM   table_name t
)
WHERE out_of_range = 1
GROUP BY id, rn

Output : Output ：

 ID |身份证 | DATE_MEASURED_START | DATE_MEASURED_START | DATE_MEASURED_END | DATE_MEASURED_END | CONSECUTIVE_OUT_OF_RANGE ---: |:------------------ |:------------------ | CONSECUTIVE_OUT_OF_RANGE ---: |:----------------- |:----------------- | -----------------------: 3147 | ----------------------: 3147 | 2019-08-09 20:00:00 | 2019-08-09 20:00:00 | 2019-08-09 20:00:00 | 2019-08-09 20:00:00 | 1 3147 | 1 3147 | 2019-08-09 23:00:00 | 2019-08-09 23:00:00 | 2019-08-10 01:00:00 | 2019-08-10 01:00:00 | 3 3125 | 3 3125 | 2019-08-10 00:00:00 | 2019-08-10 00:00:00 | 2019-08-10 02:00:00 | 2019-08-10 02:00:00 | 3 3125 | 3 3125 | 2019-08-09 21:00:00 | 2019-08-09 21:00:00 | 2019-08-09 22:00:00 | 2019-08-09 22:00:00 | 2 2

db<>fiddle here db<> 在这里摆弄

Oracle 按连续值查询分组并获取开始日期和结束日期

问题描述

2 个解决方案

解决方案1
1 2019-10-21 14:41:08

解决方案2
0 2019-10-21 13:29:00

Oracle 按连续值查询分组并获取开始日期和结束日期

问题描述

2 个解决方案

解决方案1 1 2019-10-21 14:41:08

解决方案2 0 2019-10-21 13:29:00

解决方案1
1 2019-10-21 14:41:08

解决方案2
0 2019-10-21 13:29:00