[英]Oracle query group by consecutive value and get start date and end date
I have a table like this (actually, is the result of a large large query):我有一个这样的表(实际上是大型查询的结果):
id | date_measured | out_of_range
-----+-----------------------+--------------
3147 | 09/08/2019 20.00:00 | 1
3147 | 09/08/2019 21.00:00 | 0
3147 | 09/08/2019 22.00:00 | 0
3147 | 09/08/2019 23.00:00 | 1
3147 | 10/08/2019 00.00:00 | 1
3147 | 10/08/2019 01.00:00 | 1
3147 | 10/08/2019 02.00:00 | 0
3125 | 09/08/2019 20.00:00 | 0
3125 | 09/08/2019 21.00:00 | 1
3125 | 09/08/2019 22.00:00 | 1
3125 | 09/08/2019 23.00:00 | 0
3125 | 10/08/2019 00.00:00 | 1
3125 | 10/08/2019 01.00:00 | 1
3125 | 10/08/2019 02.00:00 | 1
and I need this result:我需要这个结果:
id | date_measured_start | date_measured_end | consecutive_out_of_range
-----+-----------------------+-----------------------+--------------------------
3147 | 09/08/2019 20.00:00 | 09/08/2019 20.00:00 | 1
3147 | 09/08/2019 23.00:00 | 10/08/2019 01.00:00 | 3
3125 | 09/08/2019 21.00:00 | 09/08/2019 22.00:00 | 2
3125 | 10/08/2019 00.00:00 | 10/08/2019 02.00:00 | 3
that is the consecutive recurrence of the value out_of_range = 1 and the relative start and end date.这是值out_of_range = 1和相对开始和结束日期的连续重复。
I tried to use this solution but I just can't have only the consecutive 1 for the out_of_range .我尝试使用此解决方案,但我不能只有连续的1用于out_of_range 。 value.
价值。
Here is a different application of the same method as in MT0's answer.这是与 MT0 答案中相同方法的不同应用。 The method is known as the "fixed differences" method (the "fixed differences", in both solutions, are the additional, computed value by which we group the data);
该方法被称为“固定差异”方法(两种解决方案中的“固定差异”是我们对数据进行分组的附加计算值); also known as the "tabibitosan" method.
也称为“tabibitosan”方法。
In this solution I subtract a row_number()
(appropriately modified) directly from the date, but after selecting just the rows with the flag equal to 1. This may be important if you have a very large amount of data, but only a relatively small fraction of rows have the flag equal to 1. This is because row_number()
needs to order the data, and ordering is an expensive operation.在这个解决方案中,我直接从日期中减去
row_number()
(适当修改),但是在选择标志等于 1 的行之后。如果您有大量数据,这可能很重要,但只有相对较小一小部分行的标志等于 1。这是因为row_number()
需要对数据进行排序,并且排序是一项昂贵的操作。 To solve the problem, we don't need to order (by date) the rows where the flag is 0 - only the rows where the flag is 1.为了解决这个问题,我们不需要(按日期)对标志为 0 的行进行排序 - 只需对标志为 1 的行进行排序。
EDIT (based on MT0's comment below this answer)编辑(基于 MT0 在此答案下方的评论)
MT0 points out, correctly, that my solution assumes something that is true in the test data posted by the OP, but not stated explicitly. MT0 正确地指出,我的解决方案假定 OP 发布的测试数据中的某些内容是正确的,但没有明确说明。 Namely, that the date-times in the
date_measured
column are continuous sequences of date-time, spaced at one hour intervals.也就是说,
date_measured
列中的日期时间是日期时间的连续序列,间隔为一小时。
In fact, what my solution really does is this.事实上,我的解决方案真正做的是这个。 Suppose that from the very beginning the data consisted only of the out-of-range rows (with flag equal to 1), and that the date-times in the
date_measured
column were always rounded to the hour, as they are in the OP's test data.假设从一开始数据只包含超出范围的行(标志等于 1),并且
date_measured
列中的日期时间总是四舍五入到小时,因为它们在 OP 的测试中数据。 The question, then, would be to identify the sequences of rows where the times are "consecutive" (meaning one hour apart).那么,问题将是识别时间“连续”(意味着相隔一小时)的行序列。 That's what the query does.
这就是查询的作用。
END EDIT结束编辑
I used MT0's table - from his db fiddle test.我使用了 MT0 的表格——来自他的 db fiddle 测试。 Thanks MT0!
感谢MT0!
with
tabibitosan (id, date_measured, grp) as (
select id, date_measured,
date_measured
- row_number() over (partition by id order by date_measured)
* interval '1' hour
from table_name
where out_of_range = 1
)
select id, min(date_measured) as date_measured_start,
max(date_measured) as date_measured_end,
count(*) as consecutive_out_of_range
from tabibitosan
group by id, grp
order by id, date_measured_start -- or whatever
;
ID DATE_MEASURED_START DATE_MEASURED_END CONSECUTIVE_OUT_OF_RANGE
---- ------------------- ----------------- ------------------------
3125 2019-08-09 21:00 2019-08-09 22:00 2
3125 2019-08-10 00:00 2019-08-10 02:00 3
3147 2019-08-09 20:00 2019-08-09 20:00 1
3147 2019-08-09 23:00 2019-08-10 01:00 3
Use the ROW_NUMBER
analytic function if give each row two incrementing numeric values - one per id
and the other per id
/ out_of_range
pair.如果给每一行两个递增的数值 - 每个
id
一个,另一个每个id
/ out_of_range
对,则使用ROW_NUMBER
分析 function。 If you subtract one from the other then the resulting number will be constant within a consecutive set of rows with the same id
/ out_of_range
values and you can use this to GROUP BY
:如果您从另一个中减去一个,则结果数字将在具有相同
id
/ out_of_range
值的一组连续行中保持不变,您可以将其用于GROUP BY
:
Query :查询:
SELECT id,
MIN( date_measured ) AS date_measured_start,
MAX( date_measured ) AS date_measured_end,
COUNT( * ) AS consecutive_out_of_range
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY date_measured )
- ROW_NUMBER() OVER ( PARTITION BY id, out_of_range ORDER BY date_measured )
AS rn
FROM table_name t
)
WHERE out_of_range = 1
GROUP BY id, rn
Output : Output :
ID |身份证 | DATE_MEASURED_START |
DATE_MEASURED_START | DATE_MEASURED_END |
DATE_MEASURED_END | CONSECUTIVE_OUT_OF_RANGE ---: |:------------------ |:------------------ |
CONSECUTIVE_OUT_OF_RANGE ---: |:----------------- |:----------------- | -----------------------: 3147 |
----------------------: 3147 | 2019-08-09 20:00:00 |
2019-08-09 20:00:00 | 2019-08-09 20:00:00 |
2019-08-09 20:00:00 | 1 3147 |
1 3147 | 2019-08-09 23:00:00 |
2019-08-09 23:00:00 | 2019-08-10 01:00:00 |
2019-08-10 01:00:00 | 3 3125 |
3 3125 | 2019-08-10 00:00:00 |
2019-08-10 00:00:00 | 2019-08-10 02:00:00 |
2019-08-10 02:00:00 | 3 3125 |
3 3125 | 2019-08-09 21:00:00 |
2019-08-09 21:00:00 | 2019-08-09 22:00:00 |
2019-08-09 22:00:00 | 2
2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.