简体   繁体   中英

SQL to select a new column based on certain conditions

Please help me create a SQL to generate column3 based on the following conditions

  • The value of column3 must start with 1

  • Whenever the value of Column2 is 'N', value of Column3 must be 1 added to value of Column3 in previous row.

  • Whenever the value of Column2 is 'Y', value of Column3 must be same as that of the value of Column3 in previous row

  • Whenever the value of Column1 changes, the value of Column3 must reset to 1

Data sample:

在此处输入图片说明

Thanks Teresa

Solution

DDL

create table so (
  id int
  ,column1 int
  ,column2 varchar(1)
);

insert into so values
  (1, 1, 'Y')
  ,(2, 1, 'Y')
  ,(3, 1, 'N')
  ,(4, 1, 'N')
  ,(5, 1, 'Y')
  ,(6, 1, 'Y')
  ,(7, 1, 'Y')
  ,(8, 1, 'Y')
  ,(9, 2, 'Y')
  ,(10, 2, 'Y')
  ,(11, 2, 'N');

Column3 Build

with str as (
  select
    *
    ,min(id) over (partition by column1) id_start
    ,case column2
      when 'N' 
        then row_number() over (
          partition by column1 
          order by column2, id
        ) 
      else null
    end n_value
  from
    so
), cls as (
  select
    *
    ,case
      when id_start = id 
        then 1
      else 
        coalesce(max(n_value) over (
          partition by column1
          order by id
          rows between unbounded preceding and current row
        ) + 1 ,1)
    end column3
  from
    str
)
select
  id
  ,column1
  ,column2
  ,column3
from 
  cls
order by 
  id

Explanation

You need an ordering key to make this successful, as indicated in other comments and answers. I artificially created one in the DDL, though you could certainly build another one yourself using row_number() and a different ordering key.

The str CTE in the answer provides two very critical columns that extract implicit data from the ordering: id_start and n_value .

id_start : Provides the ordering key value, id , where each column1 changes. In your definition of column3 , this is essentially your third bullet.

n_value : We need to know the number of times that the value of column3 changes. By your definition, this only happens when column2 = 'N' , so this column returns the number of times that this happens within a column1 partition.

Once we have these two pieces of data, avoiding iteration for this problem is pretty simple: column3 is the maximum value of all previous n_value s plus one. The one exception to this is when a Y immediately follows the start of a partition, in which case column3 is always 1. (This is solved with a coalesce.)

Here's a SqlFiddle using PostgreSQL. Netezza is a variant, so the syntax will still work there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM