SQL: Fill empty cells with previous row value on basis of condition?
Please treat this as High Priority Request..help needed
Requesting a high rep user link it ( http://i.imgur.com/P4UOiMz.jpg )
I need to produce the column "OXY_ID_NEW" in the following table using SQL. Is this possible in SQL 2008R2 or SQL 2012 or Amazon REDSHIFT?
SQL TABLE image( http://i.imgur.com/P4UOiMz.jpg )
Basically, I wanted to forward fill empty "OXY_ID" cells with last known Oxy_id for that ID, as shown in 'OXY_ID_NEW' column.
maybe something like
coalesce(lag(oxy_id) over (partition by id order by number), oxy_id)
..assuming the id, number cols actually increase .. in the screenshot it looks like they repeat in which case you'll need to provide the whole table definition.
The LAG example as given by gordy is the simplest as long as you have it. Note though that you cannot use it directly to update your table. Windowed functions can only appear in SELECT or ORDER BY clause, so you would need a temporary table.
For older versions you need a cursor. Something like:
declare @demo table
(
id varchar (10),
number int,
oxy_id varchar(2)
)
INSERT INTO @demo VALUES ('308_2123', 36, 'ZY')
INSERT INTO @demo VALUES ('308_2123', 36, NULL)
INSERT INTO @demo VALUES ('308_2123', 37, NULL)
INSERT INTO @demo VALUES ('308_2123', 37, NULL)
INSERT INTO @demo VALUES ('308_2123', 38, 'WY')
INSERT INTO @demo VALUES ('308_2123', 38, 'WY')
INSERT INTO @demo VALUES ('308_2123', 38, NULL)
INSERT INTO @demo VALUES ('308_2123', 39, NULL)
INSERT INTO @demo VALUES ('309_5647', 30, 'AB')
INSERT INTO @demo VALUES ('309_5647', 30, NULL)
INSERT INTO @demo VALUES ('309_5647', 31, NULL)
INSERT INTO @demo VALUES ('309_5647', 32, 'BC')
INSERT INTO @demo VALUES ('310_8897', 20, 'CD')
INSERT INTO @demo VALUES ('310_8897', 21, 'DC')
INSERT INTO @demo VALUES ('310_8897', 22, NULL)
INSERT INTO @demo VALUES ('310_8897', 23, NULL)
INSERT INTO @demo VALUES ('310_8897', 23, NULL)
INSERT INTO @demo VALUES ('311_6789', 1, NULL)
INSERT INTO @demo VALUES ('311_6789', 1, NULL)
INSERT INTO @demo VALUES ('311_6789', 2, 'EF')
INSERT INTO @demo VALUES ('311_6789', 3, 'GH')
INSERT INTO @demo VALUES ('311_6789', 3, NULL)
INSERT INTO @demo VALUES ('312_9874', 1, 'HK')
INSERT INTO @demo VALUES ('312_9874', 1, 'KY')
INSERT INTO @demo VALUES ('312_9874', 1, NULL)
INSERT INTO @demo VALUES ('312_9874', 1, 'YY')
DECLARE @id varchar(10)
DECLARE @oxy_ID varchar(2)
declare @prevOxyID varchar(10) = NULL
declare @number int
DECLARE @previd varchar(10) = NULL
DECLARE cur CURSOR FOR
(SELECT d.id, d.number, d.oxy_id FROM @demo d)
OPEN cur
FETCH NEXT FROM cur into
@id, @number, @oxy_id
WHILE @@FETCH_STATUS = 0
BEGIN
IF @oxy_id IS NULL
BEGIN
if @prevOxyID IS NOT NULL
BEGIN
IF @id = @previd
BEGIN
UPDATE @demo SET oxy_id = @prevOxyID
WHERE id = @id AND number = @number AND oxy_id IS NULL
END
ELSE
BEGIN
SET @prevOxyID = NULL
END
END
SET @previd = @id
END
ELSE
BEGIN
SET @previd = @id
SET @prevOxyID = @oxy_ID
END
FETCH NEXT FROM cur into
@id, @number, @oxy_id
END
close cur
deallocate cur
SELECT * FROM @demo
Please note that you cannot use order by in a cursor. The data must already be in the order as shown on your image. If the data in the table is not in this order then again, you will need to use a temporary table with the records inserted in the right order, and then perform the cursor on the temporary table, and finally update the original table from the temporary one.
EDIT
OK So no cursor version. As mentioned by gordy, the problem with LAG is the repeated numbers. This same problem restricts the use of UPDATE, since there is no unique identifier for a row. Instead I have to insert the results into a temporary table, delete the originals and then re-insert from temp. If you do in fact have a unique key, then please replace this delete and insert with an UPDATE. The following solution, whilst a bit long-winded, does get around the problems, and according to my research should work on Amazon Redshift, but I do not have access to test. I will not repeat the inserts, please copy from above.
declare @demo table
(
id varchar (10),
number int,
oxy_id varchar(2)
)
create table allrownums
(
id varchar (10),
number int,
oxy_id varchar(2),
rownum int
)
INSERT INTO allrownums
SELECT id, number, oxy_id, ROW_NUMBER() OVER (ORDER BY id, number) AS rownum
FROM @demo;
create table allnotnullrows
(
id varchar (10),
number int,
oxy_id varchar(2),
rownum int
)
INSERT INTO allnotnullrows
SELECT * FROM allrownums
WHERE oxy_id IS NOT NULL
create table maxrownums
(
id varchar (10),
rownum int,
maxrownum int
)
INSERT INTO maxrownums
SELECT a.id, a.rownum, Max(n.rownum)
FROM allrownums a INNER JOIN allnotnullrows n
ON n.id = a.id WHERE a.rownum >= n.rownum
GROUP BY a.id, a.rownum
create table tempresults
(
id varchar (10),
number int,
oxy_id varchar(2)
)
INSERT INTO tempresults
SELECT a.id, a.number, coalesce(a.oxy_id, n.oxy_id) as oxy_id
FROM allrownums a
LEFT JOIN maxrownums m
ON m.rownum = a.rownum
LEFT JOIN allnotnullrows n
ON a.id = n.id
and n.rownum = m.maxrownum
DELETE FROM @demo;
INSERT INTO @demo SELECT * FROM tempresults;
DROP TABLE tempresults;
DROP TABLE allrownums;
DROP TABLE allnotnullrows;
DROP TABLE maxrownums;
SELECT * FROM @demo;
Should work in SQL 2008 (I have no way to test, but CROSS APPLY works in SQL 2008)
CREATE TABLE #table1( ID INT, Number INT, OXY_ID VARCHAR( 2 ))
INSERT INTO #table1
VALUES
( 1, 23, 'AD' ),
( 2, 23, 'XY' ),
( 3, 23, '' ),
( 4, 23, '' ),
( 5, 23, 'MY' ),
( 6, 23, '' ),
( 7, 23, 'ZY' )
CREATE INDEX IX_table1__ID ON #table1( ID, OXY_ID )
SELECT a.*, c.OXY_ID AS OXY_ID_New
FROM #table1 AS a
CROSS APPLY
( SELECT TOP 1 ID, OXY_ID
FROM #table1 AS b
WHERE OXY_ID <> '' AND a.ID >= b.ID
ORDER BY ID DESC ) AS c
Comments:
Should be a lot faster than a cursor.
LAG
solution is more elegant compared to this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.