[英]How can I iterate through a SQL table and select the spikes in the data?
Lets say I have the following table:可以说我有下表:
Id|name|spike|timestamp
1 |John|15 |111
2 |Jim |12 |112
3 |Jeff|13 |113
4 |Joe |4 |114
5 |Jess|0 |115
6 |Jill|0 |116
7 |Jey |13 |117
8 |Joy |15 |118
9 |Jess|14 |119
10|Joe |0 |120
I need to iterate through the table and select data where spike > 10
and separate rows into different sets of data.我需要遍历表和 select 数据,其中
spike > 10
并将行分成不同的数据集。 An acceptable query for the top table should result in:对顶层表的可接受查询应导致:
Id|name|spike|timestamp
1 |John|15 |111
2 |Jim |12 |112
3 |Jeff|13 |113
and和
Id|name|spike|timestamp
7 |Jey |13 |117
8 |Joy |15 |118
9 |Jess|14 |119
I need to process all the spikes in the table.我需要处理表中的所有尖峰。
Edit: I don't know the number of islands in the table or how far apart they are.编辑:我不知道表格中的岛屿数量或它们相距多远。
If you do not have an idea for number of islands then use this my dynamic approach solution.如果您不知道岛屿的数量,请使用我的动态方法解决方案。 here is the demo .
这是演示。
with cte as
(
select
*,
id - row_number() over (order by id) as rnk
from myTable
where spike > 10
),
islands as
(
select
*,
(select count(distinct rnk)::int from cte) as total_islands
from cte
),
buckets as
(
select
Id,
name,
spike,
timestamp,
NTILE(total_islands) Over (Order by id) as island
from islands
)
select *
from buckets
where island = 2
for your case you can directly use window function NTILE
which will bucket your data in two parts.对于您的情况,您可以直接使用 window function
NTILE
它将您的数据分为两部分。 Here is the demo .这是演示。
first create a cte首先创建一个cte
with cte as
(
select
Id,
name,
spike,
timestamp,
NTILE(2) Over (Order by id) as nums
from myTable
where spike > 10
)
Then run first query to get first part然后运行第一个查询以获取第一部分
select
Id,
name,
spike,
timestamp
from cte
where nums = 1;
Output: Output:
| id | name | spike | timestamp |
| --- | ---- | ----- | --------- |
| 1 | John | 15 | 111 |
| 2 | Jim | 12 | 112 |
| 3 | Jeff | 13 | 113 |
Now run the second query to get second part现在运行第二个查询以获取第二部分
select
Id,
name,
spike,
timestamp
from cte
where nums = 2;
Output: Output:
| id | name | spike | timestamp |
| --- | ---- | ----- | --------- |
| 7 | Jey | 13 | 117 |
| 8 | Joy | 15 | 118 |
| 9 | Jess | 14 | 119 |
You can split your data into islands using a CTE to calculate rows according to timestamp and also partitioned by spike > 10
and then taking DENSE_RANK()
over their difference to compute the island number.您可以使用 CTE 将数据拆分为孤岛,以根据时间戳计算行,并按
spike > 10
进行分区,然后将DENSE_RANK()
用于计算孤岛数。 You can then select from that based on the island
number:然后,您可以根据
island
编号 select :
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY timestamp) AS rn,
ROW_NUMBER() OVER (PARTITION BY spike > 10 ORDER BY timestamp) AS sn
FROM data
),
islands AS (
SELECT id, name, spike, timestamp,
DENSE_RANK() OVER (ORDER BY rn - sn) AS island
FROM CTE
WHERE spike > 10
)
SELECT *
FROM islands
WHERE island = 2
Output: Output:
id name spike timestamp island
7 Jey 13 117 2
8 Joy 15 118 2
9 Jess 14 119 2
Demo on dbfiddle dbfiddle 上的演示
Note that if you can have duplicate timestamp
values but they do increase with id
, you should change the ORDER BY timestamp
clauses to ORDER BY id
.请注意,如果您可以有重复的
timestamp
值,但它们确实随id
增加,您应该将ORDER BY timestamp
子句更改为ORDER BY id
。
There are many ways to deal with gaps-and-islands problems.有很多方法可以处理“缝隙和孤岛”问题。 In this case, a cumulative sum uniquely identifies each group.
在这种情况下,累积和唯一地标识每个组。
select t.*
from (select t.*,
count(*) filter (where spike <= 10) over (order by timestamp) as island
from t
) t
where spike > 10;
This counts the number of spike
values less than or equal to 10 .这会计算小于或等于 10的
spike
值的数量。 This is constant for each group of consecutive "spikey" numbers.对于每组连续的“尖峰”数字,这是恒定的。
If you want the islands enumerated, just use dense_rank()
:如果要枚举岛屿,只需使用
dense_rank()
:
select t.*, dense_rank() over (order by island) as grouping_number
from (select t.*,
count(*) filter (where spike <= 10) over (order by timestamp) as island
from t
) t
where spike > 10;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.