SQL - How to compare changing column values without multiple sub-selects

Question

I'm writing a TSQL query.

I have the following table where the column A and B will occasionally change. I'm interested in every row where either A or B has changed as compared to the previous row (or when the previos row doesn't exist, that is to say, the first row). Each date will always be unique.

Date                    A   B       SysId
2015-02-01 00:00:00.000 2   1201    949410
2015-01-01 00:00:00.000 3   1201    949410
2014-01-01 00:00:00.000 2   1201    949410
2013-01-01 00:00:00.000 2   1200    949410
2012-01-01 00:00:00.000 2   1200    949410
2011-01-01 00:00:00.000 2   1200    949410
2010-01-01 00:00:00.000 2   1200    949410
2009-01-01 00:00:00.000 2   1200    949410
2008-01-01 00:00:00.000 2   1200    949410
2007-01-01 00:00:00.000 2   1200    949410
2006-01-01 00:00:00.000 2   1200    949410
2005-01-01 00:00:00.000 2   1200    949410
2004-01-01 00:00:00.000 2   1200    949410
2003-01-01 00:00:00.000 2   1200    949410
2002-01-01 00:00:00.000 3   1200    949410
2001-01-01 00:00:00.000 2   1200    949410
2000-01-01 00:00:00.000 3   1200    949410
1999-01-01 00:00:00.000 3   1200    949410
1998-01-01 00:00:00.000 3   1200    949410
1997-01-01 00:00:00.000 3   1200    949410
1996-01-01 00:00:00.000 3   1200    949410
1995-01-01 00:00:00.000 3   1200    949410
1994-01-01 00:00:00.000 3   1200    949410
1993-01-01 00:00:00.000 3   1200    949410
1992-01-01 00:00:00.000 3   1200    949410
1991-01-01 00:00:00.000 3   1200    949410
1990-01-01 00:00:00.000 3   1200    949410
1989-01-01 00:00:00.000 3   1200    949410
1988-01-01 00:00:00.000 3   1200    949410
1987-01-01 00:00:00.000 3   1200    949410
1986-01-01 00:00:00.000 3   1200    949410
1985-01-01 00:00:00.000 3   1200    949410
1984-01-01 00:00:00.000 2   1200    949410

In this case, the result should be:

Date                    A   B       SysId
2015-02-01 00:00:00.000 2   1201    949410
2015-01-01 00:00:00.000 3   1201    949410
2014-01-01 00:00:00.000 2   1201    949410
2003-01-01 00:00:00.000 2   1200    949410
2002-01-01 00:00:00.000 3   1200    949410
2001-01-01 00:00:00.000 2   1200    949410
1985-01-01 00:00:00.000 3   1200    949410
1984-01-01 00:00:00.000 2   1200    949410

Since we are interested in the first row where A or B has changed.

I have an extremly ugly and expensive select which does this for me:

SELECT Date, A, B, SysId
FROM SysHistory fb1
WHERE fb1.SysId = 949410
AND 
(
    (
        ((
            SELECT TOP 1 fb2b.A
            FROM SysHistory fb2b
            WHERE fb2b.Date < fb1.Date 
            AND fb2b.SysId = 949410
            order by Date DESC
        )) <> fb1.StatusId
        OR 
        ((
            SELECT TOP 1 fb2a.A
            FROM SysHistory fb2a
            WHERE fb2a.Date < fb1.Date 
            AND fb2a.SysId= 949410
            order by Date  DESC
        )) IS NULL
    )
    OR
    (
        ((
            SELECT TOP 1 fb3b.B
            FROM SysHistory fb3b
            WHERE fb3b.Date < fb3b.Date 
            AND fb3b.SysId= 949410
            order by Date DESC
        )) <> fb1.StatusId
        OR 
        ((
            SELECT TOP 1 fb3a.B
            FROM SysHistory fb3a
            WHERE fb3a.Date < fb1.Date 
            AND fb3a.SysId = 949410
            order by Date DESC
        )) IS NULL
    )
)
order by Date DESC

Notice that for each I fetch the top A or B attribute from the previous row. Since the previous row might be null (in the case when we are on the first row in the table), I also have an OR statement for A and B which checks for null.

I feel like there must be a better way to do this.

Is it possible to, in TSQL, compare multiple columns in the same subselect? Or just generally, how would you improve this query? Is there anyway to make it more compact or potentially faster?

I guess my question is bordering on best practice but I feel that this is technically a syntax question.

Import Update I've now noticed that my query doesn't actually give me the results I want. So the SQL query above doesn't seem to work. The result in this case should be

Date                    A   B       SysId
2015-02-01 00:00:00.000 2   1201    949410
2015-01-01 00:00:00.000 3   1201    949410
2014-01-01 00:00:00.000 2   1201    949410
2003-01-01 00:00:00.000 2   1200    949410
2002-01-01 00:00:00.000 3   1200    949410
2001-01-01 00:00:00.000 2   1200    949410
1985-01-01 00:00:00.000 3   1200    949410
1984-01-01 00:00:00.000 2   1200    949410

Instead, the result is:

Date                    A   B       SysId
2015-02-01 00:00:00.000 2   1201    949410
2015-01-01 00:00:00.000 3   1201    949410
2003-01-01 00:00:00.000 2   1200    949410
2002-01-01 00:00:00.000 3   1200    949410
2001-01-01 00:00:00.000 2   1200    949410
1985-01-01 00:00:00.000 3   1200    949410
1984-01-01 00:00:00.000 2   1200    949410

Answer 1

You can apply ROW_NUMBER() against the data so that you can perform a self-join to find previous rows:

;WITH Numbered as (
  SELECT Date, A, B, SysId,
    ROW_NUMBER() OVER (ORDER BY Date desc) as rn
  FROM SysHistory fb1
  WHERE fb1.SysId = 949410
)
select n1.*
from Numbered n1
   left join
     Numbered n2
        on n1.rn = n2.rn - 1
where
  n2.Date is null or --If you want to include the earliest row
  n1.A <> n2.A or
  n1.B <> n2.B

Results (having put your sample data in a table variable called @SysHistory , changed above query to reference it, and escaped the Date column as [Date] since using type names as column names is usually a bad idea):

Date                    A           B           SysId       rn
----------------------- ----------- ----------- ----------- --------------------
2015-02-01 00:00:00.000 2           1201        949410      1
2015-01-01 00:00:00.000 3           1201        949410      2
2014-01-01 00:00:00.000 2           1201        949410      3
2003-01-01 00:00:00.000 2           1200        949410      14
2002-01-01 00:00:00.000 3           1200        949410      15
2001-01-01 00:00:00.000 2           1200        949410      16
1985-01-01 00:00:00.000 3           1200        949410      32
1984-01-01 00:00:00.000 2           1200        949410      33

Which seems to match your expected result (except for my extra column)

SQL - How to compare changing column values without multiple sub-selects

Question

1 answers

solution1
2 ACCPTED 2015-06-24 09:02:22

SQL - How to compare changing column values without multiple sub-selects

Question

1 answers

solution1 2 ACCPTED 2015-06-24 09:02:22

solution1
2 ACCPTED 2015-06-24 09:02:22