I'm writing a TSQL query.
I have the following table where the column A and B will occasionally change. I'm interested in every row where either A or B has changed as compared to the previous row (or when the previos row doesn't exist, that is to say, the first row). Each date will always be unique.
Date A B SysId
2015-02-01 00:00:00.000 2 1201 949410
2015-01-01 00:00:00.000 3 1201 949410
2014-01-01 00:00:00.000 2 1201 949410
2013-01-01 00:00:00.000 2 1200 949410
2012-01-01 00:00:00.000 2 1200 949410
2011-01-01 00:00:00.000 2 1200 949410
2010-01-01 00:00:00.000 2 1200 949410
2009-01-01 00:00:00.000 2 1200 949410
2008-01-01 00:00:00.000 2 1200 949410
2007-01-01 00:00:00.000 2 1200 949410
2006-01-01 00:00:00.000 2 1200 949410
2005-01-01 00:00:00.000 2 1200 949410
2004-01-01 00:00:00.000 2 1200 949410
2003-01-01 00:00:00.000 2 1200 949410
2002-01-01 00:00:00.000 3 1200 949410
2001-01-01 00:00:00.000 2 1200 949410
2000-01-01 00:00:00.000 3 1200 949410
1999-01-01 00:00:00.000 3 1200 949410
1998-01-01 00:00:00.000 3 1200 949410
1997-01-01 00:00:00.000 3 1200 949410
1996-01-01 00:00:00.000 3 1200 949410
1995-01-01 00:00:00.000 3 1200 949410
1994-01-01 00:00:00.000 3 1200 949410
1993-01-01 00:00:00.000 3 1200 949410
1992-01-01 00:00:00.000 3 1200 949410
1991-01-01 00:00:00.000 3 1200 949410
1990-01-01 00:00:00.000 3 1200 949410
1989-01-01 00:00:00.000 3 1200 949410
1988-01-01 00:00:00.000 3 1200 949410
1987-01-01 00:00:00.000 3 1200 949410
1986-01-01 00:00:00.000 3 1200 949410
1985-01-01 00:00:00.000 3 1200 949410
1984-01-01 00:00:00.000 2 1200 949410
In this case, the result should be:
Date A B SysId
2015-02-01 00:00:00.000 2 1201 949410
2015-01-01 00:00:00.000 3 1201 949410
2014-01-01 00:00:00.000 2 1201 949410
2003-01-01 00:00:00.000 2 1200 949410
2002-01-01 00:00:00.000 3 1200 949410
2001-01-01 00:00:00.000 2 1200 949410
1985-01-01 00:00:00.000 3 1200 949410
1984-01-01 00:00:00.000 2 1200 949410
Since we are interested in the first row where A or B has changed.
I have an extremly ugly and expensive select which does this for me:
SELECT Date, A, B, SysId
FROM SysHistory fb1
WHERE fb1.SysId = 949410
AND
(
(
((
SELECT TOP 1 fb2b.A
FROM SysHistory fb2b
WHERE fb2b.Date < fb1.Date
AND fb2b.SysId = 949410
order by Date DESC
)) <> fb1.StatusId
OR
((
SELECT TOP 1 fb2a.A
FROM SysHistory fb2a
WHERE fb2a.Date < fb1.Date
AND fb2a.SysId= 949410
order by Date DESC
)) IS NULL
)
OR
(
((
SELECT TOP 1 fb3b.B
FROM SysHistory fb3b
WHERE fb3b.Date < fb3b.Date
AND fb3b.SysId= 949410
order by Date DESC
)) <> fb1.StatusId
OR
((
SELECT TOP 1 fb3a.B
FROM SysHistory fb3a
WHERE fb3a.Date < fb1.Date
AND fb3a.SysId = 949410
order by Date DESC
)) IS NULL
)
)
order by Date DESC
Notice that for each I fetch the top A or B attribute from the previous row. Since the previous row might be null (in the case when we are on the first row in the table), I also have an OR statement for A and B which checks for null.
I feel like there must be a better way to do this.
Is it possible to, in TSQL, compare multiple columns in the same subselect? Or just generally, how would you improve this query? Is there anyway to make it more compact or potentially faster?
I guess my question is bordering on best practice but I feel that this is technically a syntax question.
Import Update I've now noticed that my query doesn't actually give me the results I want. So the SQL query above doesn't seem to work. The result in this case should be
Date A B SysId
2015-02-01 00:00:00.000 2 1201 949410
2015-01-01 00:00:00.000 3 1201 949410
2014-01-01 00:00:00.000 2 1201 949410
2003-01-01 00:00:00.000 2 1200 949410
2002-01-01 00:00:00.000 3 1200 949410
2001-01-01 00:00:00.000 2 1200 949410
1985-01-01 00:00:00.000 3 1200 949410
1984-01-01 00:00:00.000 2 1200 949410
Instead, the result is:
Date A B SysId
2015-02-01 00:00:00.000 2 1201 949410
2015-01-01 00:00:00.000 3 1201 949410
2003-01-01 00:00:00.000 2 1200 949410
2002-01-01 00:00:00.000 3 1200 949410
2001-01-01 00:00:00.000 2 1200 949410
1985-01-01 00:00:00.000 3 1200 949410
1984-01-01 00:00:00.000 2 1200 949410
You can apply ROW_NUMBER()
against the data so that you can perform a self-join to find previous rows:
;WITH Numbered as (
SELECT Date, A, B, SysId,
ROW_NUMBER() OVER (ORDER BY Date desc) as rn
FROM SysHistory fb1
WHERE fb1.SysId = 949410
)
select n1.*
from Numbered n1
left join
Numbered n2
on n1.rn = n2.rn - 1
where
n2.Date is null or --If you want to include the earliest row
n1.A <> n2.A or
n1.B <> n2.B
Results (having put your sample data in a table variable called @SysHistory
, changed above query to reference it, and escaped the Date
column as [Date]
since using type names as column names is usually a bad idea):
Date A B SysId rn
----------------------- ----------- ----------- ----------- --------------------
2015-02-01 00:00:00.000 2 1201 949410 1
2015-01-01 00:00:00.000 3 1201 949410 2
2014-01-01 00:00:00.000 2 1201 949410 3
2003-01-01 00:00:00.000 2 1200 949410 14
2002-01-01 00:00:00.000 3 1200 949410 15
2001-01-01 00:00:00.000 2 1200 949410 16
1985-01-01 00:00:00.000 3 1200 949410 32
1984-01-01 00:00:00.000 2 1200 949410 33
Which seems to match your expected result (except for my extra column)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.