简体   繁体   中英

How to use first_value window function to get last non null value in Postgresql

I want to perform a fill-down activity in PgSQL

DDL:

create table brands 
(
id int,
category varchar(20),
brand_name varchar(20)
);
insert into brands values
(1,'chocolates','5-star')
,(2,null,'dairy milk')
,(3,null,'perk')
,(4,null,'eclair')
,(5,'Biscuits','britannia')
,(6,null,'good day')
,(7,null,'boost')
,(8,'shampoo','h&s')
,(9,null,'dove');

Expected output is:

category brand_name
chocolates 5-star
chocolates dairy milk
chocolates perk
chocolates eclair
Biscuits britannia
Biscuits good day
Biscuits boost
Shampoo h&s
Shampoo dove

I tried using the following script but it doesn't seem to work.

select id,
      first_value(category)
      over(order by case when category is not null then id end desc nulls last) as category,
      brand_name
from brands

Can someone suggest a fix.

In MS SQL the following snippet seems to work fine.

select id,
       first_value (category) IGNORE NULLS
       over(order by id desc
        rows between current row and unbounded following) as category,
       brand_name
FROM brands
ORDER BY id
with cte as (
select id,
       category,
       count(category) over (order by id) as category_id,
       brand_name
  from brands)
select id,
       first_value(category) over (partition by category_id order by id) as category,
       brand_name
  from cte;

UPDATE: added query without CTE per request:

select id,
       (array_agg(category) over (order by id))[max(case when category is null then 0 else id end) over (order by id)] as category,
       brand_name
  from brands;

I think there's nothing wrong with using a CTE (see JHH's answer) and I would prefer that.

Postgres DB's don't provide this IGNORE NULLS concept of SQLServer DB's, so I guess you should stop thinking you will get a nearly identic query for Postgres DB's like in MS SQL.

Anyway, if you don't want to use a CTE or complex subquery, you can define your own function and aggregation and run this.

Function creation:

-- CREATE your function
CREATE FUNCTION yourFunction(STATE anyelement, VALUE anyelement)
    RETURNS anyelement
    IMMUTABLE PARALLEL safe
AS
$$
SELECT COALESCE(VALUE, STATE); -- Replace NULL values here
$$ LANGUAGE SQL;

Aggregate creation using the function:

-- CREATE your aggregate
CREATE AGGREGATE yourAggregate(ANYELEMENT) (
    sfunc = yourFunction, -- Call your function here
    stype = ANYELEMENT
);

Your query using this aggregate:

SELECT id, 
  yourAggregate(category) -- Call your aggregate here
  OVER (ORDER BY id, category), 
  brand_name
FROM brands
ORDER BY id;

Of course, you should rename both function and aggregate and use more meaningful names.

This will produce the same outcome like the CTE version.

Try out: db<>fiddle

If you are keen on defining and using own functions and you will use that often, you could do that.

Otherwise, just use a CTE, that's quite fine. There is no reason to do not use CTE's.

Always be aware that while using own functions, you are on risk of bad performance , so you should check whether this query is too slow.

I am afraid that's not implemented in Postgres (at least up to Postgres 15). The manual about window functions :

The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead , lag , first_value , last_value , and nth_value . This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS .

So you have to use a workaround with a CTE or subquery like JHH suggested, or roll your own window function (which will be comparatively slow).

See (answer to similar question on dba.SE):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM