简体   繁体   English

SQL根据特定条件选择新列

[英]SQL to select a new column based on certain conditions

Please help me create a SQL to generate column3 based on the following conditions请帮我创建一个SQL,根据以下条件生成column3

  • The value of column3 must start with 1 column3 的值必须以 1 开头

  • Whenever the value of Column2 is 'N', value of Column3 must be 1 added to value of Column3 in previous row.每当 Column2 的值为 'N' 时,Column3 的值必须在前一行的 Column3 的值上加 1。

  • Whenever the value of Column2 is 'Y', value of Column3 must be same as that of the value of Column3 in previous row每当 Column2 的值为 'Y' 时,Column3 的值必须与前一行 Column3 的值相同

  • Whenever the value of Column1 changes, the value of Column3 must reset to 1每当 Column1 的值发生变化时,Column3 的值必须重置为 1

Data sample:数据样本:

在此处输入图片说明

Thanks Teresa谢谢特蕾莎

Solution解决方案

DDL数据线

create table so (
  id int
  ,column1 int
  ,column2 varchar(1)
);

insert into so values
  (1, 1, 'Y')
  ,(2, 1, 'Y')
  ,(3, 1, 'N')
  ,(4, 1, 'N')
  ,(5, 1, 'Y')
  ,(6, 1, 'Y')
  ,(7, 1, 'Y')
  ,(8, 1, 'Y')
  ,(9, 2, 'Y')
  ,(10, 2, 'Y')
  ,(11, 2, 'N');

Column3 Build Column3 构建

with str as (
  select
    *
    ,min(id) over (partition by column1) id_start
    ,case column2
      when 'N' 
        then row_number() over (
          partition by column1 
          order by column2, id
        ) 
      else null
    end n_value
  from
    so
), cls as (
  select
    *
    ,case
      when id_start = id 
        then 1
      else 
        coalesce(max(n_value) over (
          partition by column1
          order by id
          rows between unbounded preceding and current row
        ) + 1 ,1)
    end column3
  from
    str
)
select
  id
  ,column1
  ,column2
  ,column3
from 
  cls
order by 
  id

Explanation说明

You need an ordering key to make this successful, as indicated in other comments and answers.如其他评论和答案所示,您需要一个订购密钥才能使此操作成功。 I artificially created one in the DDL, though you could certainly build another one yourself using row_number() and a different ordering key.我在 DDL 中人为地创建了一个,尽管您当然可以使用row_number()和不同的排序键自己构建另一个。

The str CTE in the answer provides two very critical columns that extract implicit data from the ordering: id_start and n_value .答案中的str CTE 提供了两个非常关键的列,它们从排序中提取隐式数据: id_startn_value

id_start : Provides the ordering key value, id , where each column1 changes. id_start :提供排序键值id ,其中每个column1发生变化。 In your definition of column3 , this is essentially your third bullet.在您对column3的定义中,这基本上是您的第三个项目符号。

n_value : We need to know the number of times that the value of column3 changes. n_value :我们需要知道column3的值变化的次数。 By your definition, this only happens when column2 = 'N' , so this column returns the number of times that this happens within a column1 partition.根据您的定义,这仅在column2 = 'N'时发生,因此该列返回在column1分区中发生这种情况的次数。

Once we have these two pieces of data, avoiding iteration for this problem is pretty simple: column3 is the maximum value of all previous n_value s plus one.一旦我们有了这两个数据,避免这个问题的迭代就很简单了: column3是所有先前n_value的最大值加一。 The one exception to this is when a Y immediately follows the start of a partition, in which case column3 is always 1. (This is solved with a coalesce.)一个例外是当 Y 紧跟在分区的开头时,在这种情况下column3始终为 1。(这是通过合并解决的。)

Here's a SqlFiddle using PostgreSQL.这是一个使用 PostgreSQL 的SqlFiddle Netezza is a variant, so the syntax will still work there. Netezza 是一个变体,所以语法在那里仍然有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM