简体   繁体   English

SQL查询以返回时间轴的结果集

[英]Sql query to return a result set for a Timeline

I think the best way to describe what I am looking for is to show a table of data and what I want returned from my Query. 我认为描述我正在寻找的最好方法是显示一个数据表以及我想从查询中返回的内容。 This is a simple data table in SQL Server: 这是SQL Server中的简单数据表:

JobNumber TimeOfWeigh 
100       01/01/2014 08:00 
100       01/01/2014 09:00 
100       01/01/2014 10:00 
200       01/01/2014 12:00 
200       01/01/2014 13:00 
300       01/01/2014 15:00 
300       01/01/2014 16:00 
100       02/01/2014 08:00 
100       02/01/2014 09:00 
100       03/01/2014 10:00 

I want a query that will group the job and return the first and last DateTime from each group. 我想要一个将作业分组并从每个组返回第一个和最后一个DateTime的查询。 However, as you can see here there are 2 sets of the 100 Job Number. 但是,正如您在这里看到的那样,在100个职位编号中有2套。 I dont want the second set joined with the first. 我不希望第二套与第一套结合。

Instead I would like this: 相反,我想这样:

JobNumber   First Weigh         Last Weigh
100         01/01/2014 08:00    01/01/2014 10:00
200         01/01/2014 12:00    01/01/2014 13:00
300         01/01/2014 15:00    01/01/2014 16:00
100         02/01/2014 08:00    03/01/2014 10:00

I have been struggling with this for hours. 我已经为此努力了几个小时。 Any help would be appreciated. 任何帮助,将不胜感激。

EDITED EDITED

The Date & Times are all just dummy random data. 日期和时间都只是虚拟随机数据。 The actual data has thousands of weighs within one day. 实际数据在一天之内有成千上万的重量。 I want the first and last weight of each job to determine the duration of the job so I can represent the duration on a timeline. 我希望每个工作的第一和最后一个权重确定该工作的持续时间,以便可以在时间轴上表示该持续时间。 But I want to display Job 100 twice, indicating it was paused and resumed after 200 & 300 were completed 但是我想两次显示作业100,表示作业已暂停并在200和300完成后恢复

Here's my attempt at this, using row_number() with a partition. 这是我的尝试,将row_number()与分区一起使用。 I've broken it into steps to hopefully make it easy to follow. 我将其分为几个步骤,以期使其易于遵循。 If your table already has a column with integer identifiers in it, then you can omit the first CTE. 如果您的表中已经有带有整数标识符的列,则可以省略第一个CTE。 Even after that, you might be able to simplify this further, but it does appear to work. 即使在那之后,您也许可以进一步简化它,但是它确实起作用了。

(Edited to add a flag indicating jobs with multiple ranges as requested in a comment.) (已编辑以添加标记,以指示注释中要求的具有多个范围的作业。)

declare @sampleData table (JobNumber int, TimeOfWeigh datetime);
insert into @sampleData values
    (100, '01/01/2014 08:00'),
    (100, '01/01/2014 09:00'), 
    (100, '01/01/2014 10:00'),
    (200, '01/01/2014 12:00'),
    (200, '01/01/2014 13:00'),
    (300, '01/01/2014 15:00'),
    (300, '01/01/2014 16:00'),
    (100, '02/01/2014 08:00'),
    (100, '02/01/2014 09:00'),
    (100, '03/01/2014 10:00');

-- The first CTE assigns an ordering to the records according to TimeOfWeigh,
-- producing the row numbers you gave in your example.
with JobsCTE as
(    
    select 
        row_number() over (order by TimeOfWeigh) as RowNumber, 
        JobNumber,
        TimeOfWeigh
    from @sampleData
),

-- The second CTE orders by the RowNumber we created above, but restarts the
-- ordering every time the JobNumber changes. The difference between RowNumber
-- and this new ordering will be constant within each group.
GroupsCTE as
(
    select
        RowNumber - row_number() over (partition by JobNumber order by RowNumber) as GroupNumber,
        JobNumber,
        TimeOfWeigh
    from JobsCTE
),

-- Join by JobNumber alone to determine which jobs appear multiple times.
DuplicatedJobsCTE as
(
    select JobNumber 
    from GroupsCTE 
    group by JobNumber 
    having count(distinct GroupNumber) > 1
)

-- Finally, we use GroupNumber to get the mins and maxes from contiguous ranges.
select
    G.JobNumber,
    min(G.TimeOfWeigh) as [First Weigh],
    max(G.TimeOfWeigh) as [Last Weigh],
    case when D.JobNumber is null then 0 else 1 end as [Multiple Ranges]
from
    GroupsCTE G
    left join DuplicatedJobsCTE D on G.JobNumber = D.JobNumber
group by
    G.JobNumber,
    G.GroupNumber,
    D.JobNumber
order by
    [First Weigh];

you have to use self joins to create pseudo tables that contain the first, and last row in each set. 您必须使用自我联接来创建伪表,这些伪表包含每个集合中的第一行和最后一行。

Select F.JobNumber, 
   f.TimeOfWeigh FirstWeigh, 
   l.TimeOfWeigh LastWeigh
From table f -- for first record
   join table l -- for last record
       on l.JobNumber = f.JobNumber 
          And Not exists
              (Select * from table
               Where JobNumber = f.JobNumber 
                  And id = f.id-1)
          And Not exists
              (Select * from table
               Where JobNumber = f.JobNumber 
                  And id = l.id+1)
          And Not Exists
              (Select * from table
               Where JobNumber <> f.JobNumber 
                  And id Between f.Id and l.Id)

This one fascinated me when I saw it, and I wondered how I would go about solving it. 当我看到它时,这让我着迷,我想知道如何去解决它。 I was too busy to get in with an answer first, and I got it working later but have sat on it for a few days since! 我太忙了,无法先找到答案,后来又开始工作了,但是已经坐了几天了! After a few days I still understand what I devised, which is a good sign :) 几天后,我仍然了解我的设计,这是一个好兆头:)

I've added some extra data at the end to demonstrate that this works with single-row JobNumber entries, rather than assuming that weighings will always be in batches, but the first rows in the results match the original solution. 我在末尾添加了一些额外的数据,以证明该方法适用于单行JobNumber条目,而不是假设称重始终是成批的,但是结果的第一行与原始解决方案匹配。

This approach also uses cascading CTEs (one more than the accepted answer here but I won't let that discourage me!) with the first being the test data setup : 这种方法还使用级联CTE (比此处接受的答案多一个,但我不会阻止我!) ,第一个是测试数据设置:

With Weighs AS   -- sample data
(
SELECT 100 AS JobNumber, '01/01/2014 08:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '01/01/2014 09:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '01/01/2014 10:00' AS TimeOfWeigh UNION ALL 
SELECT 200 AS JobNumber, '01/01/2014 12:00' AS TimeOfWeigh UNION ALL 
SELECT 200 AS JobNumber, '01/01/2014 13:00' AS TimeOfWeigh UNION ALL 
SELECT 300 AS JobNumber, '01/01/2014 15:00' AS TimeOfWeigh UNION ALL 
SELECT 300 AS JobNumber, '01/01/2014 16:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '02/01/2014 08:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '02/01/2014 09:00' AS TimeOfWeigh UNION ALL 
SELECT 100 AS JobNumber, '03/01/2014 10:00' AS TimeOfWeigh UNION ALL
SELECT 400 AS JobNumber, '04/01/2014 14:00' AS TimeOfWeigh UNION ALL
SELECT 300 AS JobNumber, '04/01/2014 14:30' AS TimeOfWeigh
)
,
Numbered AS  -- add on a unique consecutive row number
( SELECT *, ROW_NUMBER() OVER (ORDER BY TimeOfWeigh) AS ID FROM Weighs )
, 
GroupEnds AS  -- add on a 1/0 flag for whether it's the first or last in a run
( SELECT *,
    CASE WHEN -- next row is different JobNumber?
      (SELECT ID FROM Numbered n2 WHERE n2.ID=n1.ID+1 AND n2.JobNumber=n1.JobNumber) IS NULL
    THEN 1 ELSE 0 END AS GroupEnd,
    CASE WHEN -- previous row is different JobNumber?
      (SELECT ID FROM Numbered n2 WHERE n2.ID=n1.ID-1 AND n2.JobNumber=n1.JobNumber) IS NULL
    THEN 1 ELSE 0 END AS GroupBegin
  FROM Numbered n1 
)
,
Begins_and_Ends AS  -- make sure there are always matching pairs
( SELECT * FROM GroupEnds WHERE GroupBegin=1
    UNION ALL
  SELECT * FROM GroupEnds WHERE GroupEnd=1
)
,
Pairs AS  -- give matching pairs the same ID number for GROUPing next..
( SELECT *, (1+Row_Number() OVER (ORDER BY ID))/2 AS PairID
  FROM Begins_and_Ends
)
SELECT
  Min(JobNumber) AS JobNumber,
  Min(TimeOfWeigh) as [First Weigh],
  Max(TimeOfWeigh) as [Last Weigh]
FROM Pairs
GROUP BY PairID
ORDER BY PairID

The Numbered CTE is fairly obvious, giving an ordered ID number to each row. Numbered CTE非常明显,为每行提供一个有序的ID号。

CTE GroupEnds adds on a pair of booleans - a 1 or 0 if the row is the first or last in a run of JobNumbers - by trying to see if the next or previous row is the same JobNumber. CTE GroupEnds会添加一对布尔值-如果该行是JobNumbers运行中的第一行或最后一行,则为1或0-通过尝试查看下一行或上一行是否为同一JobNumber。

From there I simply needed a way to pair up the adjacent GroupBegins and GroupEnds. 从那里,我只需要一种将相邻的GroupBegins和GroupEnds配对的方法。 I played with the N-tile ranking function NTILE() to generate these numbers by dividing the rowcount by 2 by counting the GroupEnds and SELECTing that result as the parameter for NTILE() - but this broke when there were an odd number of rows due to single-row batches where the same row is a Begin and End of a batch. 我使用N-tile排名函数NTILE()来生成这些数字,方法是通过对GroupEnds进行计数并将行数除以2,然后将结果选择为NTILE()的参数来进行选择-但这在行数为奇数时会中断到单行批处理,其中同一行是批处理的开始和结束。

I got around this by guaranteeing an equal number of Begin and End rows : a UNION of Begin rows and End rows, even if some are the same rows. 我通过保证相等的开始行和结束行来解决这个问题:即使开始行和结束行是相同的,也要保证开始行和结束行的并集。 This is CTE Begins_and_Ends . 这是CTE Begins_and_Ends

The Pairs CTE adds on Pair Numbers using Row_Number() divided by two - the integer result PairID being the same for pairs of rows. Pairs CTE使用Row_Number()除以2在对号上添加-整数结果PairID对于行对PairID是相同的。

This gives us the following - all rows in the middle of JobNumber batches have been filtered out by now : 这为我们提供了以下信息-JobNumber批处理中间的所有行现已过滤掉:

JOBNUMBER  TIMEOFWEIGH     ID  End? Begin PairID
100     01/01/2014 08:00    1   0   1     1
100     01/01/2014 10:00    3   1   0     1
200     01/01/2014 12:00    4   0   1     2
200     01/01/2014 13:00    5   1   0     2
300     01/01/2014 15:00    6   0   1     3
300     01/01/2014 16:00    7   1   0     3
100     02/01/2014 08:00    8   0   1     4
100     03/01/2014 10:00    10  1   0     4
400     04/01/2014 14:00    11  1   1     5
400     04/01/2014 14:00    11  1   1     5
300     04/01/2014 14:30    12  1   1     6
300     04/01/2014 14:30    12  1   1     6

From there it's now a final piece of cake to GROUP BY the PairID and grab the first and last weigh times. 现在,从PairIDPairID ,这是最后一块蛋糕,并抓住了第一和最后一次称重时间。 I enjoyed the challenge, I wonder if anyone else finds it useful in any weigh ! 我喜欢挑战,我不知道其他人发现它是有用的任何掂量
http://sqlfiddle.com/#!3/b4f39/48 http://sqlfiddle.com/#!3/b4f39/48

Yep, this is a fascinating mind puzzle. 是的,这是一个令人着迷的思维难题。 Thank you for sharing it. 感谢您的分享。 I wanted to come up with the solution that does not involve EXISTS or JOINS 我想提出一个不涉及EXISTS或JOINS的解决方案

First I created a table with job_id (j_id) and integer value to be used for sequencing (j_v). 首先,我创建了一个带有job_id(j_id)和整数值的表,该表用于排序(j_v)。 Ints are just easier to type, while the logic is exactly the same as for the date times. 整数更容易键入,而逻辑与日期时间完全相同。

     select * from j order by j_v;
 j_id | j_v 
------+-----
  100 |   1
  100 |   2
  100 |   2
  100 |   2
  100 |   2
  100 |   3
  200 |   4
  200 |   5
  300 |   6
  300 |   6
  300 |   6
  300 |   7
  300 |   7
  100 |   8
  100 |   9
(15 rows)

I used windows functions and 3 CTEs: 我使用了Windows函数和3个CTE:

  • First one adds lead and lag from the table 第一个添加表格中的领先和落后
  • Second one filters leaving only those rows that are either start or end of the job 第二个过滤器仅保留那些开始或结束作业的行
  • Third one introduces row_number used to remove all even rows. 第三个介绍了用于删除所有偶数行的row_number。

Here you go: 干得好:

with X AS (
select j_id, j_v,
       coalesce ( lag(j_id,1) OVER (MY_W), -1)  as j_id_lag,
       lag(j_v,1) over (MY_W) as j_v_lag,
       coalesce ( lead(j_id,1) OVER (MY_W), -1)  as j_id_lead,
       lead(j_v,1) over (MY_W) as j_v_lead
from j
WINDOW MY_W as ( ORDER BY j_v)
order by j_v 
),
Y AS ( 
select *
from X
where j_id_lag != j_id_lead
),
Z AS ( 
select * ,
      lead(j_v) OVER () AS L2,
      row_number() OVER () as my_row
from Y
) 
SELECT j_id, j_v as job_start ,l2 as job_end
from Z
where my_row %2 = 1
;
 j_id | job_start | job_end
------+-----+----
  100 |   1 |  3
  200 |   4 |  5
  300 |   6 |  7
  100 |   8 |  9
(4 rows)

Here comes the query plan: 查询计划如下:

                                                    QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 CTE Scan on z  (cost=325.94..379.17 rows=11 width=12) (actual time=0.047..0.071 rows=4 loops=1)
   Filter: ((my_row % 2::bigint) = 1)
   Rows Removed by Filter: 4
   CTE x
     ->  WindowAgg  (cost=149.78..203.28 rows=2140 width=8) (actual time=0.027..0.039 rows=15 loops=1)
           ->  Sort  (cost=149.78..155.13 rows=2140 width=8) (actual time=0.019..0.019 rows=15 loops=1)
                 Sort Key: j.j_v
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on j  (cost=0.00..31.40 rows=2140 width=8) (actual time=0.004..0.006 rows=15 loops=1)
   CTE y
     ->  CTE Scan on x  (cost=0.00..48.15 rows=2129 width=24) (actual time=0.031..0.050 rows=8 loops=1)
           Filter: (j_id_lag <> j_id_lead)
           Rows Removed by Filter: 7
   CTE z
     ->  WindowAgg  (cost=0.00..74.51 rows=2129 width=24) (actual time=0.042..0.062 rows=8 loops=1)
           ->  CTE Scan on y  (cost=0.00..42.58 rows=2129 width=24) (actual time=0.031..0.052 rows=8 loops=1)
 Total runtime: 0.122 ms
(17 rows)

As you see, there is one sort (to order the data by sequence value, or time in original question) and several CTE scans, but no joins. 如您所见,只有一种(按序列值或原始问题中的时间对数据排序)和几种CTE扫描,但没有联接。 Complexity - NlogN for sort which exactly what I was looking for. 复杂性-NlogN正是我想要的那种。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM