简体   繁体   中英

How to backfill missing values in Sybase (outer apply and derived tables work in MSSQL but not Sybase)

The problem

I have some tables with dates and numerical values; some numerical values are missing. I want those missing values to be filled in with the value available at the last date.

I need to do it:

  1. In a Sybase database, to which I only have read access; I can create temporary tables but not any permanent tables nor permanent views
  2. In a Microsoft SQL Server 2019 (Version 15) database, to which I have full access

I have found how to do it in Microsoft SQL Server, but not in Sybase.

UPDATE : If I run select @@version I get

Adaptive Server Enterprise/16.0 SP02 PL08 Instrumented

Eg I need to go from

+-------+-------+
| date  | value |
+-------+-------+
| 1-Nov |   100 |
| 2-Nov |       |
| 3-Nov |       |
| 4-Nov |   110 |
| 5-Nov |       |
| 6-Nov |   105 |
+-------+-------+

to (see the asterisks):

+-------+--------+
| date  | value  |
+-------+--------+
| 1-Nov |   100  |
| 2-Nov |   *100 |
| 3-Nov |   *100 |
| 4-Nov |   110  |
| 5-Nov |   *110 |
| 6-Nov |   105  |
+-------+--------+

Attempt #1

I found an example here

In SQL Server, it runs without errors but it doesn't update anything. In Sybase, it doesn't run at all. I get:

incorrect syntax near the keyword top

This is a reproducible example

CREATE TABLE #my_test (my_date datetime, my_value float NULL )
go

INSERT INTO #my_test SELECT '1-Nov-2021',100
INSERT INTO #my_test SELECT '2-Nov-2021',NULL
INSERT INTO #my_test SELECT '3-Nov-2021',NULL
INSERT INTO #my_test SELECT '4-Nov-2021',110
INSERT INTO #my_test SELECT '5-Nov-2021',NULL
INSERT INTO #my_test SELECT '6-Nov-2021',105

go


UPDATE #my_test
set my_value = (
                select top 1 b.my_value
                from #my_test b
                where b.my_date < a.my_date and b.my_date = a.my_date and b.my_value is not null
                order by b.my_date desc
                
                )
                
from #my_test a
where a.my_value is null

go

Attempt #2

Outer apply works with Microsoft SQL but it doesn't seem supported in Sybase: in Sybase I get

Incorrect syntax near 'outer'

update #my_test
set my_value = coalesce(pr.my_value, nx.my_value)

from #my_test m

outer apply --next non-null value
(
select top 1 *
from #my_test x
where x.my_value is not null
and x.my_date > m.my_date
order by my_date 
) nx

outer apply -- previous non-null
(select top 1 *
from #my_test x
where x.my_value is not null
and x.my_date < m.my_date
order by my_date desc
) pr

where m.my_value is null

Attempt #3

This was in the same link as before. This code works in SQL Sevrer but Sybase tells me:

You cannot use a derived table in the FROM clause of an UPDATE or DELETE statement

My code:

update #my_test  set #my_test.my_value = tt.NewAmount 
from #my_test t
inner join (
    select my_date, coalesce(min(my_value) over (order by my_date desc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW),
                           min(my_value) over (order by my_date asc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) NewAmount
    from #my_test t
) tt on tt.my_date = t.my_date
where t.my_value is null

One idea using a select to fill the gap (eg, OP using a select into ):

select  mt1.*,isnull(mt1.my_value,
                        (select mt2.my_value
                         from   #my_test mt2
                         where  mt2.my_date = (select   max(my_date)
                                                from    #my_test mt3
                                                where   mt3.my_date < mt1.my_date
                                                and     mt3.my_value is not NULL))) as new_value
from    #my_test mt1
order by mt1.my_date           -- for display purposes
go

 my_date                         my_value    new_value
 ------------------------------- ----------- -----------
             Nov  1 2021 12:00AM         100         100
             Nov  2 2021 12:00AM        NULL         100
             Nov  3 2021 12:00AM        NULL         100
             Nov  4 2021 12:00AM         110         110
             Nov  5 2021 12:00AM        NULL         110
             Nov  6 2021 12:00AM         105         105

Assuming data is already in the #temp table and OP wants to run an update :

update  #my_test
set     my_value = (select      mt2.my_value
                        from    #my_test mt2
                        where   mt2.my_date = (select   max(mt3.my_date)
                                                from    #my_test mt3
                                                where   mt3.my_date < mt1.my_date
                                                and     mt3.my_value is not NULL))

from    #my_test mt1
where   my_value is NULL
go

select * from #my_test
order by my_date
go

 my_date                         my_value
 ------------------------------- -----------
             Nov  1 2021 12:00AM         100
             Nov  2 2021 12:00AM         100
             Nov  3 2021 12:00AM         100
             Nov  4 2021 12:00AM         110
             Nov  5 2021 12:00AM         110
             Nov  6 2021 12:00AM         105

NOTES:

  • it's not clear (to me) what we're supposed to do if the 'first' n rows in the table have my_value=NULL (ie, there's no 'previous' my_value to copy); if this is an issue OP can probably wrap the sub-select in a isnull(<sub-select>,@default_value)
  • above code tested in ASE 16.0 SP04 GA (though this should work on most (all?) versions as we're just using standard sub-query syntax)
  • for largish data volumes the queries will benefit from an index on (my_date [,my_value]) ; while OP may not be able to add an index to a permanent table they should be able to add an index to #my_test before performing an update (said index should be created after the initial population of the table so as to provide usable stats for my_date )
  • whether an index on #my_test will be used will depend on server and session level settings for statement_cache , literal_autoparam and deferred_name_resolution as well as volume of data in #my_test

Depending on the number of NULLS you need to backfill, you will end up to around 50% of the rows requiring an update.

In all databases I encountered, this situation calls for a CREATE TABLE ... AS SELECT , rather than a mass update. And - in my example, I avoided the use of keywords like DATE or VALUE .

Hence - here's a complete example - using LAST_VALUE( ... IGNORE NULLS) .

DROP TABLE IF EXISTS indata;
CREATE TABLE
indata(dt,val) AS (
          SELECT DATE '1-Nov-2021',100
UNION ALL SELECT DATE '2-Nov-2021',NULL
UNION ALL SELECT DATE '3-Nov-2021',NULL
UNION ALL SELECT DATE '4-Nov-2021',110
UNION ALL SELECT DATE '5-Nov-2021',NULL
UNION ALL SELECT DATE '6-Nov-2021',105
);

DROP TABLE IF EXISTS outdata;
CREATE TABLE outdata AS
SELECT
  dt
, LAST_VALUE(val IGNORE NULLS) OVER(ORDER BY dt) AS val
FROM indata
;

ALTER TABLE indata RENAME TO indata_old;
-- sp_rename indata, indata_old;
ALTER TABLE outdata  RENAME TO indata;
-- sp_rename indata, indata; -- do you rename tables like this in Sybase?
SELECT * FROM indata;
-- out      dt     | val 
-- out ------------+-----
-- out  2021-11-01 | 100
-- out  2021-11-02 | 100
-- out  2021-11-03 | 100
-- out  2021-11-04 | 110
-- out  2021-11-05 | 110
-- out  2021-11-06 | 105

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM