繁体   English   中英

SQL,针对一个“关键”间隔找到多个范围的离散重叠时间间隔并计算“最严格”的常见重叠?

SQL, finding discrete overlapping time intervals of multiple ranges against one “key” interval & calculate “most restrictive” common overlaps?

提示:本站收集StackOverFlow近2千万问答,支持中英文搜索,鼠标放在语句上弹窗显示对应的参考中文或英文, 本站还提供   中文繁体   英文版本   中英对照 版本,有任何建议请联系yoyou2525@163.com。

我在SQL Server Management Studio中工作,所以我想这是Microsoft SQL Server T-SQL问题。

实际情况是这样的:我有多个员工在多个位置工作,并且每个位置都有“ time in”和“ time out”记录。 我已经为每个时间间隔创建了一个唯一的“班次ID”,并根据员工,日期和位置加入了与我的主要员工相匹配的其他员工的班次,或者将我与其他所有人进行比较的班次。

此外,我编写了一个查询,用基石拉出每个“其他雇员”的特定重叠时间间隔。 对于一个班次,时间轴如下所示:

 Key Emp. | 9AM------------------------6PM
 Emp. A   | 9AM------------------------6PM
 Emp. B   |         12PM-------4PM        

因此,可以进行真正的“受控”比较的离散时间段包括:

  • 从12PM-4PM的键,A和B
  • 从上午9点至下午12点的密钥和A
  • 从下午4点至下午6点的密钥和A

最终目标是提取在这些时间段内发生的每个员工的所有活动(在单独的表中组织为带有日期时间戳的事件),并比较每个相关员工的总计。 因此,每个时间范围都将有一个单独的“计数(事件)”总数,仅受如上所述共享时间间隔的员工影响。

目前,我的数据是这样组织的:

键和其他员工的“输入”和“输出”列存储为TIMESTAMP; 在我的示例中,“ 1 / 1,6PM”只是节省空间的糟糕方法。 请在这篇文章的结尾查看我的消耗数据。 SSMS似乎并不在乎我是否拥有超过TIMESTAMP列,并将它们都像DATETIME一样对待:

Key_ShiftID| Key In | Key Out | Oth_Emp_ShiftID | Oth_Emp_In | Oth_Emp_Out
  K          1/1,9AM   1/1,6PM     A                1/1,9AM     1/1,6PM 
  K          1/1,9AM   1/1,6PM     B                1/1,12PM    1/1,4PM 

班次ID(Key_ShiftID和Oth_Emp_ShiftID)是唯一的字符串,并且时间间隔由两列(Key_In和Key_Out + Oth_Emp_In和Oth_Emp_Out)定义为日期时间/时间戳。 我正在寻找可以比较员工活动的离散时间,该时间在单独的表中,每个事件都有一个独特的日期时间,如前所述。 因此,我认为结束数据将如下所示:

Key_ShiftID| Key_In | Key_Out | Oth_Emp_ShiftID | Oth_Emp_In  | Oth_Emp_Out
  K          1/1,9AM   1/1,6PM     A                1/1,12PM    1/1,4PM 
  K          1/1,9AM   1/1,6PM     B                1/1,12PM    1/1,4PM 
  K          1/1,9AM   1/1,6PM     A                1/1,9AM     1/1,12PM
  K          1/1,9AM   1/1,6PM     A                1/1,4PM     1/1,6PM

这样我就可以通过ShiftID将上面的表加入到我的活动表中,并为每个相关员工引入Count(events)

where event_datetime >= Oth_Emp_In and event_datetime <= Oth_Emp_Out

另外,如前所述,我已经编写了一个查询来减少非关键员工的班次,以反映与关键员工重叠的时间间隔,因此Other_Emp_In将始终大于或等于关键输入。时间和Other_Emp_Out将始终小于或等于密钥输出时间。

提前致谢。 我已经研究了2天左右,并为此做出了努力。

以下是一键移动的示例数据(并非上面的确切示例):

另外,SQL Server似乎并不在乎我是否拥有超过TIMESTAMP列,并将它们都视为DATETIME。

CREATE TABLE "sample_data" 
(
    "Employee" INT,
    "Key_ShiftID" TEXT,
    "Key_In" TIMESTAMP,
    "Key_Out" TIMESTAMP,
    "Other_Emp_ShiftID" TEXT,
    "Other_Emp_In" TIMESTAMP,
    "Other_Emp_Out" TIMESTAMP,
    "overlap_min" TIMESTAMP,
    "overlap_max" TIMESTAMP
);

INSERT INTO "sample_data" 
VALUES (900, '545BD826-0C9A-408B-BE9F-4C3D7D307948', '2016-09-27 14:15:00', '2016-09-27 21:45:00', '035FA1C1-B469-44EB-B5B4-5B6948574464', '2016-09-27 08:45:00', '2016-09-27 16:15:00', '2016-09-27 14:15:00', '2016-09-27 16:15:00'),
       (78, '545BD826-0C9A-408B-BE9F-4C3D7D307948', '2016-09-27 14:15:00', '2016-09-27 21:45:00', '74035838-FD07-4F8D-8AC4-F6407AC786D9', '2016-09-27 18:00:00', '2016-09-27 21:15:00', '2016-09-27 18:00:00', '2016-09-27 21:15:00'),
       (900, '545BD826-0C9A-408B-BE9F-4C3D7D307948', '2016-09-27 14:15:00', '2016-09-27 21:45:00', 'D7E9ADCD-8631-476D-B69F-00626F0E4B06', '2016-09-27 16:45:00', '2016-09-27 21:45:00', '2016-09-27 16:45:00', '2016-09-27 21:45:00');
1 个回复

欢迎使用StackOverflow。 将来,请尝试包含一些易消耗的示例数据,例如我在下面的解决方案中包含的数据。

这是一个有趣的小问题。 对于这种事情,我利用了patExtract8K函数,该函数利用了ngrams8K 这是一个如何使用PatExtract的示例; 在这里,我从一个字符串中提取钱:

SELECT p.* 
FROM   dbo.patextract8K('Pay me $50.17 now or $1000 later!','[^$0-9.]') AS p;

结果:

itemNumber  itemIndex  itemLength  item
----------- ---------- ----------- --------
1           8          6           $50.17
2           22         5           $1000

现在解决您的问题:

-- Easily consumable sample data
DECLARE @table TABLE (shiftId VARCHAR(2), empKey VARCHAR(5), workDuration VARCHAR(100));
INSERT @table(shiftId,empKey,workDuration)
VALUES
('K','A','12PM - 4PM'),
('K','B','12PM - 4PM'),
('K','A','9AM - 12PM'),
('K','A','4PM - 6PM');

-- Solution
SELECT 
  shiftId   = f.shiftId, 
  KeyIn     = '1/1,'+REPLACE(CONVERT(VARCHAR(10),
               MIN(CAST(f.c1 AS TIME)) OVER (),100),':00',''),
  KeyOut    = '1/1,'+REPLACE(CONVERT(VARCHAR(10),
               MAX(CAST(f.c2 AS TIME)) OVER (),100),':00',''),
  empShift  = f.empKey,
    othEmpIn  = '1/1,'+f.c1, 
  othEmpOut = '1/1,'+f.c2
FROM
(
  SELECT      t.shiftId, t.empKey, t.workDuration, 
              c1 = MAX(CASE p.itemNumber WHEN 1 THEN p.item END), 
              c2 = MAX(CASE p.itemNumber WHEN 2 THEN p.item END)
  FROM        @table AS t
  CROSS APPLY dbo.patExtract8k(t.workDuration, '[^0-9APM]') AS p
  CROSS APPLY (VALUES(CAST(p.item AS TIME))) AS tm(N)
  GROUP BY    t.shiftId, t.empKey, t.workDuration
) AS f;

结果:

shiftId KeyIn      KeyOut       empShift othEmpIn     othEmpOut
------- ---------- ------------ -------- ------------ ------------
K       1/1,9AM    1/1,6PM      A        1/1,12PM     1/1,4PM
K       1/1,9AM    1/1,6PM      A        1/1,4PM      1/1,6PM
K       1/1,9AM    1/1,6PM      A        1/1,9AM      1/1,12PM
K       1/1,9AM    1/1,6PM      B        1/1,12PM     1/1,4PM

请注意,我不知道“ 1/1”来自何处,因此我只是对其进行了硬编码。

这是我的基本功能。 所有这些都对以很少的代码有效地解决各种SQL问题非常有帮助。

CREATE FUNCTION dbo.rangeAB
(
  @low  bigint, 
  @high bigint, 
  @gap  bigint,
  @row1 bit
)
/****************************************************************************************
[Purpose]:
 Creates up to 531,441,000,000 sequentia1 integers numbers beginning with @low and ending 
 with @high. Used to replace iterative methods such as loops, cursors and recursive CTEs 
 to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and 
 enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is 
 based comes from Jeff Moden's fnTally function. 

 The name range because it's similar to clojure's range function. The name "rangeAB" as 
 used because "range" is a reserved SQL keyword.

[Author]: Alan Burstein

[Compatibility]: 
 SQL Server 2008+ and Azure SQL Database

[Syntax]:
 SELECT r.RN, r.OP, r.N1, r.N2
 FROM dbo.rangeAB(@low,@high,@gap,@row1) AS r;

[Parameters]:
 @low  = a bigint that represents the lowest value for n1.
 @high = a bigint that represents the highest value for n1.
 @gap  = a bigint that represents how much n1 and n2 will increase each row; @gap also
         represents the difference between n1 and n2.
 @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins
         at 0, when @row = 1 then rn will begin at 1.

[Returns]:
 Inline Table Valued Function returns:
 rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can 
      start at 0 or 1 which is dictated by @row1.
 op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and
      ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins
      with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...
 n1 = bigint; a sequential number starting at the value of @low and incrimentingby the
      value of @gap until it is less than or equal to the value of @high.
 n2 = bigint; a sequential number starting at the value of @low+@gap and  incrimenting 
      by the value of @gap.

[Dependencies]:
N/A

[Developer Notes]:

 1. The lowest and highest possible numbers returned are whatever is allowable by a 
    bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3). 
 2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row
    unless its used in a query where a filter is applied to rn.
 3. @gap must be greater than 0 or the function will not return any rows.
 4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of
    rows returned minus 1
 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
    use the RN column. Use N1 and/or N2 when you need to begin your sequence at any 
    number other than 0 or 1 or if you need a gap between your sequence of numbers. 
 6. Although @gap is a bigint it must be a positive integer or the function will
    not return any rows.
 7. The function will not return any rows when one of the following conditions are true:
      * any of the input parameters are NULL
      * @high is less than @low 
      * @gap is not greater than 0
    To force the function to return all NULLs instead of not returning anything you can
    add the following code to the end of the query:

      UNION ALL 
      SELECT NULL, NULL, NULL, NULL
      WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)

    This code was excluded as it adds a ~5% performance penalty.
 8. There is no performance penalty for sorting by rn ASC; there is a large performance 
    penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0
    If you need a descending sort the use op in place of rn then sort by rn ASC. 

Best Practices:
--===== 1. Using RN (rownumber)
 -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):
 SELECT RN FROM dbo.rangeAB(1,5,1,1);
 -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):
 SELECT RN FROM dbo.rangeAB(0,5,1,0);

--===== 2. Using OP for descending sorts without a performance penalty
 -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):
 SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;
 -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):
 SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;

--===== 3. Using N1
 -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
 SELECT N1 FROM dbo.rangeAB(-3,3,1,1);
 -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);
 -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);

--===== 4. Using N2 and @gap
 -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:
 SELECT N1 FROM dbo.rangeAB(0,100,10,1);
 -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.
 --       For example, to get (0,10),(10,20),(20,30).... (90,100):
 SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);
 -- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:
 SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);

[Examples]:
--===== 1. Generating Sample data (using rangeAB to create "dummy rows")
 -- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000
 SELECT
   someId    = r.rn,
   someNumer = ABS(CHECKSUM(NEWID())%450000)+50001 
 FROM rangeAB(1,10000,1,1) r;

--===== 2. Create a series of dates; rn is 0 to include the first date in the series
 DECLARE @startdate DATE = '20180101', @enddate DATE = '20180131';

 SELECT r.rn, calDate = DATEADD(dd, r.rn, @startdate)
 FROM dbo.rangeAB(1, DATEDIFF(dd,@startdate,@enddate),1,0) r;
 GO

--===== 3. Splitting (tokenizing) a string with fixed sized items
 -- given a delimited string of identifiers that are always 7 characters long
 DECLARE @string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';

 SELECT
   itemNumber = r.rn, -- item's ordinal position 
   itemIndex  = r.n1, -- item's position in the string (it's CHARINDEX value)
   item       = SUBSTRING(@string, r.n1, 7) -- item (token)
 FROM dbo.rangeAB(1, LEN(@string), 8,1)  r;
 GO

--===== 4. Splitting (tokenizing) a string with random delimiters
 DECLARE @string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';

 SELECT
   itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position 
   itemIndex  = r.n1+1, -- item's position in the string (it's CHARINDEX value)
   item       = SUBSTRING
               (
                 @string,
                 r.n1+1,
                 ISNULL(NULLIF(CHARINDEX(',',@string,r.n1+1),0)-r.n1-1, 8000)
               ) -- item (token)
 FROM dbo.rangeAB(0,DATALENGTH(@string),1,1) r
 WHERE SUBSTRING(@string,r.n1,1) = ',' OR r.n1 = 0;
 -- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/

--===== 5. Grouping by a weekly intervals
 -- 5.1. how to create a series of start/end dates between @startDate & @endDate
 DECLARE @startDate DATE = '1/1/2015', @endDate DATE = '2/1/2015';
 SELECT 
   WeekNbr   = r.RN,
   WeekStart = DATEADD(DAY,r.N1,@StartDate), 
   WeekEnd   = DATEADD(DAY,r.N2-1,@StartDate)
 FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r;
 GO

 -- 5.2. LEFT JOIN to the weekly interval table
 BEGIN
  DECLARE @startDate datetime = '1/1/2015', @endDate datetime = '2/1/2015';
  -- sample data 
  DECLARE @loans TABLE (loID INT, lockDate DATE);
  INSERT @loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), @startDate)
  FROM dbo.rangeAB(1,50,1,1) r;

  -- solution 
  SELECT 
    WeekNbr   = r.RN,
    WeekStart = dt.WeekStart, 
    WeekEnd   = dt.WeekEnd,
    total     = COUNT(l.lockDate)
  FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r
  CROSS APPLY (VALUES (
    CAST(DATEADD(DAY,r.N1,@StartDate) AS DATE), 
    CAST(DATEADD(DAY,r.N2-1,@StartDate) AS DATE))) dt(WeekStart,WeekEnd)
  LEFT JOIN @loans l ON l.lockDate BETWEEN  dt.WeekStart AND dt.WeekEnd
  GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;
 END;

--===== 6. Identify the first vowel and last vowel in a along with their positions
 DECLARE @string VARCHAR(200) = 'This string has vowels';

 SELECT TOP(1) position = r.rn, letter = SUBSTRING(@string,r.rn,1)
 FROM dbo.rangeAB(1,LEN(@string),1,1) r
 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
 ORDER BY r.rn;

 -- To avoid a sort in the execution plan we'll use op instead of rn
 SELECT TOP(1) position = r.op, letter = SUBSTRING(@string,r.op,1)
 FROM dbo.rangeAB(1,LEN(@string),1,1) r
 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
 ORDER BY r.rn;

---------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20140518 - Initial Development - Alan Burstein
 Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section
                   - Alan Burstein
 Rev 02 - 20180613 - Complete re-design including opposite number column (op)
 Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS 
(
  SELECT 1
  FROM (VALUES
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0)) T(N) -- 90 values 
),
L2(N)  AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT  
  r.RN,
  r.OP,
  r.N1,
  r.N2
FROM
(
  SELECT
    RN = 0,
    OP = (@high-@low)/@gap,
    N1 = @low,
    N2 = @gap+@low
  WHERE @row1 = 0
  UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
  SELECT TOP (ABS((COALESCE(@high,0)-COALESCE(@low,0))/COALESCE(@gap,0)+COALESCE(@row1,1)))
    RN = i.rn,
    OP = (@high-@low)/@gap+(2*@row1)-i.rn,
    N1 = (i.rn-@row1)*@gap+@low,
    N2 = (i.rn-(@row1-1))*@gap+@low
  FROM iTally AS i
  ORDER BY rn
) AS r
WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0;
GO

IF OBJECT_ID('dbo.NGrams8k', 'IF') IS NOT NULL DROP FUNCTION dbo.NGrams8k;
GO
CREATE FUNCTION dbo.NGrams8k
(
  @string VARCHAR(8000), -- Input string
  @N      INT            -- requested token size
)
/*****************************************************************************************
[Purpose]:
 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens
 based on an input string (@string). Accepts strings up to 8000 varchar characters long.
 For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.

[Author]: 
 Alan Burstein

[Compatibility]:
 SQL Server 2008+, Azure SQL Database

[Syntax]:
--===== Autonomous
 SELECT ng.position, ng.token 
 FROM   dbo.NGrams8k(@string,@N) AS ng;

--===== Against a table using APPLY
 SELECT      s.SomeID, ng.position, ng.token
 FROM        dbo.SomeTable AS s
 CROSS APPLY dbo.NGrams8K(s.SomeValue,@N) AS ng;

[Parameters]:
 @string  = The input string to split into tokens.
 @N       = The size of each token returned.

[Returns]:
 Position = BIGINT; the position of the token in the input string
 token    = VARCHAR(8000); a @N-sized character-level N-Gram token

[Dependencies]:
 1. dbo.rangeAB (iTVF)

[Developer Notes]:
 1. NGrams8k is not case sensitive;

 2. Many functions that use NGrams8k will see a huge performance gain when the optimizer
    creates a parallel execution plan. One way to get a parallel query plan (if the
    optimizer does not choose one) is to use make_parallel by Adam Machanic which can be
    found here:
 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

3. When @N is less than 1 or greater than the datalength of the input string then no
    tokens (rows) are returned. If either @string or @N are NULL no rows are returned.
    This is a debatable topic but the thinking behind this decision is that: because you
    can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you
    can't turn anything into NULL-grams, no rows should be returned.

    For people who would prefer that a NULL input forces the function to return a single
    NULL output you could add this code to the end of the function:

    UNION ALL
    SELECT 1, NULL
    WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL)

 4. NGrams8k is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

[Examples]:
--===== 1. Split the string, "abcd" into unigrams, bigrams and trigrams
 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',1) AS ng; -- unigrams (@N=1)
 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',2) AS ng; -- bigrams  (@N=2)
 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',3) AS ng; -- trigrams (@N=3)

--===== How many times the substring "AB" appears in each record
 DECLARE @table TABLE(stringID int identity primary key, string varchar(100));
 INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');

 SELECT      string, occurances = COUNT(*)
 FROM        @table t
 CROSS APPLY dbo.NGrams8k(t.string,2) AS ng
 WHERE       ng.token = 'AB'
 GROUP BY    string;

[Revision History]:
------------------------------------------------------------------------------------------
 Rev 00 - 20140310 - Initial Development - Alan Burstein
 Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added
                     conversion to bigint in the TOP logic to remove implicit conversion
                     to bigint - Alan Burstein
 Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less
                     than the length of @string. Updated comment section. - Alan Burstein
 Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N
                     parameters to prevent a NULL string or NULL @N from causing "an
                     improper value" being passed to the TOP clause. - Alan Burstein
 Rev 05 - 20171228 - Small simplification; changed: 
                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))
                                           to:
                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))+1-ISNULL(@N,1)),0)))
 Rev 06 - 20180612 - Using CHECKSUM(N) in the to convert N in the token output instead of
                     using (CAST N as int). CHECKSUM removes the need to convert to int.
 Rev 07 - 20180612 - re-designed to: (1) use dbo.rangeAB - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
  position   = r.RN,
  token      = SUBSTRING(@string, CHECKSUM(r.RN), @N)
FROM  dbo.rangeAB(1, LEN(@string)+1-@N,1,1) AS r
WHERE @N > 0 AND @N <= LEN(@string);
GO

CREATE FUNCTION dbo.patExtract8K
(
  @string  VARCHAR(8000),
  @pattern VARCHAR(50)
)
/*****************************************************************************************
[Description]:
 This can be considered a T-SQL inline table valued function (iTVF) equivalent of 
 Microsoft's mdq.RegexExtract except:

 1. It includes each matching substring's position in the string

 2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
    instead of nvarchar(4000) for the pattern

 3. The mask parameter is not required and therefore does not exist.

 4. You have specify what text we're searching for as an exclusion; e.g. for numeric 
    characters you should search for '[^0-9]' instead of '[0-9]'. 

 5. There is is no parameter for naming a "capture group". Using the variable below, both 
    the following queries will return the same result:

     DECLARE @string nvarchar(4000) = N'123 Main Street';

   SELECT item FROM dbo.patExtract8K(@string, '[^0-9]');
   SELECT clr.RegexExtract(@string, N'(?<number>(\d+))(?<street>(.*))', N'number', 1);

 Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
 http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
 rows where [matched]=0. The key benefit of is that it performs substantially better 
 because you are only returning the number of rows required instead of returning twice as
 many rows then filtering out half of them.  Furthermore, because we're 

 The following two sets of queries return the same result:

 DECLARE @string varchar(100) = 'xx123xx555xx999';
 BEGIN
 -- QUERY #1
   -- patExtract8K
   SELECT ps.itemNumber, ps.item 
   FROM dbo.patExtract8K(@string, '[^0-9]') ps;

   -- patternSplitCM   
   SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item 
   FROM dbo.patternSplitCM(@string, '[^0-9]') ps
   WHERE [matched] = 0;

 -- QUERY #2
   SELECT ps.itemNumber, ps.item 
   FROM dbo.patExtract8K(@string, '[0-9]') ps;

   SELECT itemNumber = row_number() over (order by itemNumber), item 
   FROM dbo.patternSplitCM(@string, '[0-9]')
   WHERE [matched] = 0;
 END;

[Compatibility]:
 SQL Server 2008+

[Syntax]:
--===== Autonomous
 SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
 FROM dbo.patExtract8K(@string,@pattern) pe;

--===== Against a table using APPLY
 SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
 FROM dbo.SomeTable t
 CROSS APPLY dbo.patExtract8K(t.someString, @pattern) pe;

[Parameters]:
 @string        = varchar(8000); the input string
 @searchString  = varchar(50); pattern to search for

[Returns]:
 itemNumber = bigint; the instance or ordinal position of the matched substring
 itemIndex  = bigint; the location of the matched substring inside the input string
 itemLength = int; the length of the matched substring
 item       = varchar(8000); the returned text

[Developer Notes]:
 1. Requires NGrams8k 

 2. patExtract8K does not return any rows on NULL or empty strings. Consider using 
    OUTER APPLY or append the function with the code below to force the function to return 
    a row on emply or NULL inputs:

    UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;

 3. patExtract8K is not case sensitive; use a case sensitive collation for 
    case-sensitive comparisons

 4. patExtract8K is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

 5. patExtract8K performs substantially better with a parallel execution plan, often
    2-3 times faster. For queries that leverage patextract8K that are not getting a 
    parallel exeution plan you should consider performance testing using Traceflag 8649 
    in Development environments and Adam Machanic's make_parallel in production. 

[Examples]:
--===== (1) Basic extact all groups of numbers:
  WITH temp(id, txt) as
 (
   SELECT * FROM (values
   (1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
   (2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
   (3, '&**OOOOO=+ + + // ==?76543// and today !!222222\\\tester{}))22222444 done'))t(x,xx)
 )
 SELECT
   [temp.id] = t.id,
   pe.itemNumber,
   pe.itemIndex,
   pe.itemLength,
   pe.item
 FROM        temp AS t
 CROSS APPLY dbo.patExtract8K(t.txt, '[^0-9]') AS pe;
-----------------------------------------------------------------------------------------
Revision History:
 Rev 00 - 20170801 - Initial Development - Alan Burstein
 Rev 01 - 20180619 - Complete re-write   - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
       itemIndex  = f.position,
       itemLength = itemLen.l,
       item       = SUBSTRING(f.token, 1, itemLen.l)
FROM
(
 SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))
 FROM   dbo.NGrams8k(@string, 1) AS ng
 WHERE  PATINDEX(@pattern, ng.token) <  --<< this token does NOT match the pattern
        ABS(SIGN(ng.position-1)-1) +    --<< are you the first row?  OR
        PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row
) AS f(position, token)
CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),
  DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);
GO
1 计算重叠间隔的处理时间

我从下面的 salesforce 导出并转换了一个原始数据; 使用下面的这些代码 我把它变成: 但我的最终目标是处理重叠问题,我想不出如何实现它。 结果应该如下所示: 我想这看起来很简单,但实际上并非如此。 我将不胜感激任何帮助。 ...

2 计算重叠时间间隔的长度

理想输出: 如果每次相同位置内只有两个记录重叠,那么我可以通过lag来做到这一点 但问题是,我在同一位置有很多(未知)重叠时间间隔。 我该如何实现? ...

2015-07-01 08:26:13 1 664   sas
3 将多个时间间隔相加而不计算 lubridate 中的重叠时间

我需要在相同的观察中对多个间隔的天数求和。 我在 StackOverflow 中看到了许多关于此任务的不同示例。 尽管如此,我还是无法使用我的数据重现它们,因为我必须在两次以上重叠的时间和多个时间间隔内进行。 我可以总结间隔之间的总时间,但包括重叠的时间: ...

4 计算几个用户之间在SQL中的重叠时间间隔

在网上花了很多时间试图找到我的问题的答案之后,我决定在这里发布它。 我有一个表,其中包含一些资源不可用的时间间隔。 我想提取所有资源都不可用的所有时间间隔。 在下面的示例中,我将有:开始:2018-11-16 12:30:00,结束:2018-11-16 12:45:00 ...

5 通过多个时间间隔找到重叠的更好方法

这里的目标是获得多个用户的议程。 并匹配共享相同空闲时间的用户。 出于性能原因,我希望 Mysql(或任何替代方案)能够完成这项工作。 因此好友匹配的任务将通过触发器或服务器的 Cron 作业来执行。 我的想法是假设所有空闲时间范围都是每周重复发生的事件。 任何异常都会以不同的方式处理。 我 ...

6 用于查找从多个重叠间隔开始经过的时间的SQL

不使用MSSQL或DB2或Oracle。 没有CTE。 没有OVERLAP谓词。 没有INTERVAL数据类型。 情况:在要修理的车辆上工作不能开始,直到收到所有订购的部件。 零件可在维修开始前多次订购。 我们需要提取车辆在“零件保持”的时间 因此,对于标识为id = 1的车辆 ...

7 计算日期范围间隔的部分重叠数

我需要在具有 ~150K 行的数据帧上运行循环。 但是,循环需要检查每一行并检查一个条件,该条件检查数据集中每隔一行。 我的代码适用于玩具数据集,它产生正确的值,但对于我的实际数据集来说太慢了。 我让它运行了几个小时,它仍然没有完成。 所以我希望有人有更好的想法如何解决这个问题。 我想知道的是对 ...

8 如何找到重叠的间隔

我有两个data.frame,每个都有六列:样本,染色体,开始,结束,Num_Probes,Segment_Mean。 对于df_T_CNA的每个间隔,我正在寻找df_CF中的哪些间隔重叠。 最后,我打算检查在给定的重叠间隔内,来自df_CF的相应Segment_Mean值是否与来自 ...

9 如何找到没有日期部分的时间间隔之间的重叠

我必须找出2个时间范围之间的重叠,并且比较中不包括日期部分,而只有时间,例如range_1是9 AM-6PM,range_2是5 PM-8AM。 实际时间为24小时制。 我编写了一个解决方案,该解决方案可以找到重叠部分,但是当任何时间在跨天/午夜后的任何时间(例如10 PM-2AM)时,它都 ...

10 通过ID BigQuery计算重叠间隔

我想根据ID计算我有多少个重叠间隔 在这种情况下,期望的结果应该是: 每当有重叠(甚至部分重叠)时,我都需要这样计算。 如何在BigQuery中实现? ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2022 STACKOOM.COM