简体   繁体   English

如何遍历 SQL 表和 select 数据中的峰值?

[英]How can I iterate through a SQL table and select the spikes in the data?

Lets say I have the following table:可以说我有下表:

Id|name|spike|timestamp
1 |John|15   |111
2 |Jim |12   |112
3 |Jeff|13   |113
4 |Joe |4    |114
5 |Jess|0    |115
6 |Jill|0    |116
7 |Jey |13   |117
8 |Joy |15   |118
9 |Jess|14   |119
10|Joe |0    |120

I need to iterate through the table and select data where spike > 10 and separate rows into different sets of data.我需要遍历表和 select 数据,其中spike > 10并将行分成不同的数据集。 An acceptable query for the top table should result in:对顶层表的可接受查询应导致:

Id|name|spike|timestamp
1 |John|15   |111
2 |Jim |12   |112
3 |Jeff|13   |113

and

Id|name|spike|timestamp
7 |Jey |13   |117
8 |Joy |15   |118
9 |Jess|14   |119

I need to process all the spikes in the table.我需要处理表中的所有尖峰

Edit: I don't know the number of islands in the table or how far apart they are.编辑:我不知道表格中的岛屿数量或它们相距多远。

If you do not have an idea for number of islands then use this my dynamic approach solution.如果您不知道岛屿的数量,请使用我的动态方法解决方案。 here is the demo .这是演示

with cte as
(
    select
        *,
        id - row_number() over (order by id) as rnk
    from myTable
    where spike > 10
),
islands as
(
  select
      *,
      (select count(distinct rnk)::int from cte) as total_islands
  from cte
),
buckets as
(
  select
    Id,
    name,
    spike,
    timestamp,
    NTILE(total_islands) Over (Order by id) as island
  from islands
)  

select *
from buckets
where island = 2

for your case you can directly use window function NTILE which will bucket your data in two parts.对于您的情况,您可以直接使用 window function NTILE它将您的数据分为两部分。 Here is the demo .这是演示

first create a cte首先创建一个cte

with cte as
(
  select
    Id,
    name,
    spike,
    timestamp,
    NTILE(2) Over (Order by id) as nums
  from myTable
  where spike > 10
)  

Then run first query to get first part然后运行第一个查询以获取第一部分

select  
    Id,
    name,
    spike,
    timestamp
from cte
where nums = 1;

Output: Output:

| id  | name | spike | timestamp |
| --- | ---- | ----- | --------- |
| 1   | John | 15    | 111       |
| 2   | Jim  | 12    | 112       |
| 3   | Jeff | 13    | 113       |

Now run the second query to get second part现在运行第二个查询以获取第二部分

select  
    Id,
    name,
    spike,
    timestamp
from cte
where nums = 2;

Output: Output:

| id  | name | spike | timestamp |
| --- | ---- | ----- | --------- |
| 7   | Jey  | 13    | 117       |
| 8   | Joy  | 15    | 118       |
| 9   | Jess | 14    | 119       |

You can split your data into islands using a CTE to calculate rows according to timestamp and also partitioned by spike > 10 and then taking DENSE_RANK() over their difference to compute the island number.您可以使用 CTE 将数据拆分为孤岛,以根据时间戳计算行,并按spike > 10进行分区,然后将DENSE_RANK()用于计算孤岛数。 You can then select from that based on the island number:然后,您可以根据island编号 select :

WITH CTE AS (
  SELECT *,
         ROW_NUMBER() OVER (ORDER BY timestamp) AS rn,
         ROW_NUMBER() OVER (PARTITION BY spike > 10 ORDER BY timestamp) AS sn
  FROM data
),
islands AS (
  SELECT id, name, spike, timestamp,
         DENSE_RANK() OVER (ORDER BY rn - sn) AS island
  FROM CTE
  WHERE spike > 10
)
SELECT *
FROM islands 
WHERE island = 2

Output: Output:

id  name    spike   timestamp   island
7   Jey     13      117         2
8   Joy     15      118         2
9   Jess    14      119         2

Demo on dbfiddle dbfiddle 上的演示

Note that if you can have duplicate timestamp values but they do increase with id , you should change the ORDER BY timestamp clauses to ORDER BY id .请注意,如果您可以有重复的timestamp值,但它们确实随id增加,您应该将ORDER BY timestamp子句更改为ORDER BY id

There are many ways to deal with gaps-and-islands problems.有很多方法可以处理“缝隙和孤岛”问题。 In this case, a cumulative sum uniquely identifies each group.在这种情况下,累积和唯一地标识每个组。

select t.*
from (select t.*,
             count(*) filter (where spike <= 10) over (order by timestamp) as island
      from t
     ) t
where spike > 10;

This counts the number of spike values less than or equal to 10 .这会计算小于或等于 10spike值的数量。 This is constant for each group of consecutive "spikey" numbers.对于每组连续的“尖峰”数字,这是恒定的。

If you want the islands enumerated, just use dense_rank() :如果要枚举岛屿,只需使用dense_rank()

select t.*, dense_rank() over (order by island) as grouping_number
from (select t.*,
             count(*) filter (where spike <= 10) over (order by timestamp) as island
      from t
     ) t
where spike > 10;

Here is a db<>fiddle. 是一个 db<>fiddle。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM