简体   繁体   English

SQL:根据条件选择唯一值

[英]SQL: selecting unique values based on conditions

I have a table containing 5 columns. 我有一个包含5列的表。 The first column contains an ID, two columns contain parameters for those IDs with the values 0 or 1, a third column contains a parameter which I need as output, the last column contains a date. 第一列包含一个ID,两列包含这些ID的值0或1的参数,第三列包含我需要作为输出的参数,最后一列包含日期。 The same ID can appear in several rows with different parameters: 相同的ID可以显示在具有不同参数的几行中:

ID        parameter1      parameter2        parameter3       date

001       0               1                 A                01.01.2010
001       0               1                 B                02.01.2010
001       1               0                 C                01.01.2010
001       1               1                 D                01.01.2010
002       0               1                 A                01.01.2010

For each unique ID I want to return the value in parameter3 , the decision from which row to return this value is based on the values in parameter1 and parameter2 and the date: 对于每个我想返回parameter3值的唯一ID,从哪一行返回该值的决定都是基于parameter1parameter2的值以及日期:

  • If there is a row with both parameters being 0 , I want the value in this row. 如果有两个参数均为0的行,我希望该行中的值为。
  • If there is no such row, I want the value from the row where parameter1 is 0 and parameter2 is 1 , 如果没有这样的行,我希望从parameter1为0而parameter2为1的行中的值
  • If there is no such row, I want the row where parameter1 is 1 and parameter2 is 0 . 如果没有这样的行,我想要parameter1为1而parameter2为0
  • Finally, if there is no such row, I want the value from the row with both parameters being 1 . 最后,如果没有这样的行,我希望两个参数均为1的行中的值。

If there is more than one row matching the required conditions, I want the row with the most recent date. 如果符合要求条件的行多于一个,我希望该行具有最新的日期。

eg, for the table above, for the ID 001 I would want the second row with the value B in parameter3. 例如,对于上表,对于ID 001我希望第二行的参数3中的值为B

What would be the most effective / fastest way to accomplish this? 最有效/最快的方法是什么? I tried two approaches so far: 到目前为止,我尝试了两种方法:

the first one was to select all distinct IDs and then loop through the distinct IDs, using a select statement with the ID in the where clause and then loop through all the rows matching the ID while storing the necessary values in variables.: 第一个是选择所有不同的ID,然后使用带有where子句中ID的select语句遍历不同的ID,然后遍历所有与ID匹配的行,同时将必要的值存储在变量中。

foreach
    select distinct ID into i_ID from table1
        foreach
            let o_case = 5
            select case
                when parameter1 = 0 and parameter2 = 0 then 1
                when parameter1 = 0 and parameter2 = 1 then 2
                when parameter1 = 1 and parameter2 = 0 then 3
                when parameter1 = 1 and parameter2 = 1 then 4
                end, parameter3, date
                into i_case, i_p3, i_date
                from table3
                where table3.ID = i_ID

                if i_case < o_case 
                    then let o_p3, o_case, o_date = i_p3, i_case, i_date;
                    else ( if i_case = o_case and i_date > o_date
                        then let o_p3, o_date = i_p3, i_date;
                    end if;
                end if;
        end foreach;
        insert into table_output values(i_ID; o_p3);
end foreach;

The second approach was to left join the table four times with itself on the ID and apply the different combinations of the parameter1 & parameter2 as described above in the left joins, then selecting the output via nested nvl clauses: 第二种方法是将表本身与ID联接四次,并按上述左联接中的说明应用parameter1和parameter2的不同组合,然后通过嵌套的nvl子句选择输出:

select ID, 
    nvl(t1.parameter3, 
        nvl(t2.parameter3,
            nvl(t3.parameter3,
                nvl(t4.parameter3)))) parameter3
from table1 t0
    left join table1 t1
        on t0.ID = t1.ID and t1.parameter1 = 0 and t1.parameter2 = 0
        and t1.date = (select max(date) from table1 t1a where t1a.ID = t1.ID)        
    left join table1 t2
        on t0.ID = t2.ID and t2.parameter1 = 0 and t2.parameter2 = 1
        and t2.date = (select max(date) from table1 t2a where t2a.ID = t1.ID)
    left join table1 t3
        on t0.ID = t3.ID and t3.parameter1 = 1 and t3.parameter2 = 0
        and t3.date = (select max(date) from table1 t3a where t3a.ID = t3.ID)
    left join table1 t4
        on t0.ID = t4.ID and t4.parameter1 = 1 and t4.parameter2 = 1
        and t4.date = (select max(date) from table1 t4a where t4a.ID = t4.ID)

Both approaches basically worked, however, as the table is really long, they were much too slow. 两种方法基本上都可以工作,但是,由于表很长,所以速度太慢。 What would you recommend? 你会推荐什么?

PS: DBMS is IBM Informix 10, this unfortunately restricts the range of available functions a lot. PS:DBMS是IBM Informix 10,不幸的是,这限制了许多可用功能的范围。

I'm not sure if this is what you wanted, but this could work: 我不确定这是否是您想要的,但这可能有效:

SELECT id, parameter3
FROM (
    SELECT id, parameter3, RANK() OVER (
            PARTITION BY id, parameter3
            ORDER BY parameter1 ASC, parameter2 ASC, date DESC
        )
    FROM tab
) AS x
WHERE x.rank = 1;
ID        parameter1      parameter2        parameter3       date

001       0               1                 A                01.01.2010
001       0               1                 B                02.01.2010

both of the above rows having same ID, paramaeter1, parameter2 but different paraameter3, it can create trouble for you. 以上两行都具有相同的ID,paramaeter1,parameter2但具有不同的paraameter3,这可能会给您带来麻烦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM