简体   繁体   English

识别两列不同且不存在于另一列中 -

[英]Identify Distinct of two columns and not present in another column -

Background (and the requirement): There is a table "T_CLASS" with multiple columns of which two are student names ex.背景(和要求):有一个表“T_CLASS”有多个列,其中两列是学生姓名。 Student1 and Student2.学生 1 和学生 2。 There's another table "T_STUDENT" where I want to check against T_CLASS, to get the distinct students from columns T_CLASS which doesn't exist in T_STUDENT table.还有另一个表“T_STUDENT”,我想在其中检查 T_CLASS,以从 T_CLASS 列中获取不同的学生,该列在 T_STUDENT 表中不存在。

Also I would like to mention, the table contains circa 600M records each.另外我想提一下,该表每个包含大约 600M 条记录。

Sample query (and my attempt):示例查询(和我的尝试):

;with t_class(id, student1, student2) as (
    select 1, 'Tom', 'Rahul' union all
    select 2, 'Rahul', 'Nick' union all
    select 3, 'David', 'Mark' union all
    select 4, 'Rahul', 'Mark' union all
    select 5, 'Rick', 'David'
)
, t_student (c_student) as (
        select 'David' union all
        select 'Nick' union all
        select 'Mark' union all
        select 'Rick' 
)
-- Below is what I've tried --
select student1
from t_class crt
where not exists
    (
        select 1 from t_student djt
        where lower(trim(crt.student1)) = lower(trim(djt.c_student))
    )
union
select student2
from t_class crt
where not exists
    (
        select 1 from t_student djt
        where lower(trim(crt.student2)) = lower(trim(djt.c_student))
    )

Expected o/p:预期的o / p:

Rahul
Tom

Note: I don't want any specific query as solution, but I want to understand it conceptually.注意:我不希望任何特定查询作为解决方案,但我想从概念上理解它。

But is this a good technique or there can be any other optimal approach?但这是一种好的技术还是可以有任何其他最佳方法? I know I've to try out the ways and check execution plan, but can't think of any.我知道我必须尝试方法并检查执行计划,但想不出任何办法。

Please advise.请指教。 Thanks in advance.提前致谢。 :) :)

ps.附言。 Got this exec plan generated from SQL Server 2016 (however this query is actually on AWS Redshift) -得到了从 SQL Server 2016 生成的这个执行计划(但是这个查询实际上是在 AWS Redshift 上) -

在此处输入图像描述


Edit 2 - Meanwhile I've tried another attempt...编辑2 - 同时我尝试了另一次尝试......

;with t_class(id, student1, student2) as (
    select 1, 'Tom', 'Rahul' union all
    select 2, 'Rahul', 'Nick' union all
    select 3, 'David', 'Mark' union all
    select 4, 'Rahul', 'Mark' union all
    select 5, 'Rick', 'David'
)
, t_student (c_student) as (
        select 'David' union all
        select 'Nick' union all
        select 'Mark' union all
        select 'Rick' 
)
select * from
(
select student1 studs from t_class
union 
select student2 from t_class
) x
where not exists (select 1 from t_student ts where ts.c_student = x.studs)

Is this any better?这更好吗? Execution plan :执行计划 在此处输入图像描述

SELECT Z.STUDENT_NAME
(
 SELECT C.STUDENT1 AS STUDENT_NAME
    FROM T_CLASS AS C
  UNION 
 SELECT X.STUDENT2
   FROM T_CLASS AS X 
)AS Z
EXCEPT 
SELECT T.c_student
FROM T_STUDENT AS T

I hope, you can try this approach.我希望,您可以尝试这种方法。 It can be simplified, but I don't exactly remember the precedence of UNION/EXCEPT/INTERSECT-operators它可以简化,但我不完全记得 UNION/EXCEPT/INTERSECT 运算符的优先级

in postgresSQL you can do this to get students that are in Students and do not exist in Class.在 postgresSQL 中,您可以执行此操作来获取学生中的学生并且在 Class 中不存在。

SELECT name
        FROM Students S
        WHERE NOT EXISTS ( SELECT *
                           FROM Class C  
                           WHERE S.sid = C.sid)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM