[英]Identify Distinct of two columns and not present in another column -
Background (and the requirement): There is a table "T_CLASS" with multiple columns of which two are student names ex.背景(和要求):有一个表“T_CLASS”有多个列,其中两列是学生姓名。 Student1 and Student2.
学生 1 和学生 2。 There's another table "T_STUDENT" where I want to check against T_CLASS, to get the distinct students from columns T_CLASS which doesn't exist in T_STUDENT table.
还有另一个表“T_STUDENT”,我想在其中检查 T_CLASS,以从 T_CLASS 列中获取不同的学生,该列在 T_STUDENT 表中不存在。
Also I would like to mention, the table contains circa 600M records each.另外我想提一下,该表每个包含大约 600M 条记录。
Sample query (and my attempt):示例查询(和我的尝试):
;with t_class(id, student1, student2) as (
select 1, 'Tom', 'Rahul' union all
select 2, 'Rahul', 'Nick' union all
select 3, 'David', 'Mark' union all
select 4, 'Rahul', 'Mark' union all
select 5, 'Rick', 'David'
)
, t_student (c_student) as (
select 'David' union all
select 'Nick' union all
select 'Mark' union all
select 'Rick'
)
-- Below is what I've tried --
select student1
from t_class crt
where not exists
(
select 1 from t_student djt
where lower(trim(crt.student1)) = lower(trim(djt.c_student))
)
union
select student2
from t_class crt
where not exists
(
select 1 from t_student djt
where lower(trim(crt.student2)) = lower(trim(djt.c_student))
)
Expected o/p:预期的o / p:
Rahul
Tom
Note: I don't want any specific query as solution, but I want to understand it conceptually.注意:我不希望任何特定查询作为解决方案,但我想从概念上理解它。
But is this a good technique or there can be any other optimal approach?但这是一种好的技术还是可以有任何其他最佳方法? I know I've to try out the ways and check execution plan, but can't think of any.
我知道我必须尝试方法并检查执行计划,但想不出任何办法。
Please advise.请指教。 Thanks in advance.
提前致谢。 :)
:)
ps.附言。 Got this exec plan generated from SQL Server 2016 (however this query is actually on AWS Redshift) -
得到了从 SQL Server 2016 生成的这个执行计划(但是这个查询实际上是在 AWS Redshift 上) -
Edit 2 - Meanwhile I've tried another attempt...编辑2 - 同时我尝试了另一次尝试......
;with t_class(id, student1, student2) as (
select 1, 'Tom', 'Rahul' union all
select 2, 'Rahul', 'Nick' union all
select 3, 'David', 'Mark' union all
select 4, 'Rahul', 'Mark' union all
select 5, 'Rick', 'David'
)
, t_student (c_student) as (
select 'David' union all
select 'Nick' union all
select 'Mark' union all
select 'Rick'
)
select * from
(
select student1 studs from t_class
union
select student2 from t_class
) x
where not exists (select 1 from t_student ts where ts.c_student = x.studs)
SELECT Z.STUDENT_NAME
(
SELECT C.STUDENT1 AS STUDENT_NAME
FROM T_CLASS AS C
UNION
SELECT X.STUDENT2
FROM T_CLASS AS X
)AS Z
EXCEPT
SELECT T.c_student
FROM T_STUDENT AS T
I hope, you can try this approach.我希望,您可以尝试这种方法。 It can be simplified, but I don't exactly remember the precedence of UNION/EXCEPT/INTERSECT-operators
它可以简化,但我不完全记得 UNION/EXCEPT/INTERSECT 运算符的优先级
in postgresSQL you can do this to get students that are in Students and do not exist in Class.在 postgresSQL 中,您可以执行此操作来获取学生中的学生并且在 Class 中不存在。
SELECT name
FROM Students S
WHERE NOT EXISTS ( SELECT *
FROM Class C
WHERE S.sid = C.sid)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.