简体   繁体   English

SQL“选择子查询中没有的地方”不返回任何结果

[英]SQL "select where not in subquery" returns no results

Disclaimer: I have figured out the problem (I think), but I wanted to add this issue to Stack Overflow since I couldn't (easily) find it anywhere.免责声明:我已经解决了这个问题(我认为),但我想将此问题添加到 Stack Overflow,因为我无法(轻松)在任何地方找到它。 Also, someone might have a better answer than I do.另外,有人可能有比我更好的答案。

I have a database where one table "Common" is referenced by several other tables.我有一个数据库,其中一个表“Common”被其他几个表引用。 I wanted to see what records in the Common table were orphaned (ie, had no references from any of the other tables).我想看看 Common 表中的哪些记录是孤立的(即,没有来自任何其他表的引用)。

I ran this query:我运行了这个查询:

select *
from Common
where common_id not in (select common_id from Table1)
and common_id not in (select common_id from Table2)

I know that there are orphaned records, but no records were returned.我知道有孤立的记录,但没有返回记录。 Why not?为什么不?

(This is SQL Server, if it matters.) (这是 SQL Server,如果重要的话。)

Update:更新:

These articles in my blog describe the differences between the methods in more detail:我博客中的这些文章更详细地描述了这些方法之间的差异:


There are three ways to do such a query:有三种方法可以进行这样的查询:

  • LEFT JOIN / IS NULL : LEFT JOIN / IS NULL

     SELECT * FROM common LEFT JOIN table1 t1 ON t1.common_id = common.common_id WHERE t1.common_id IS NULL
  • NOT EXISTS : NOT EXISTS

     SELECT * FROM common WHERE NOT EXISTS ( SELECT NULL FROM table1 t1 WHERE t1.common_id = common.common_id )
  • NOT IN : NOT IN

     SELECT * FROM common WHERE common_id NOT IN ( SELECT common_id FROM table1 t1 )

When table1.common_id is not nullable, all these queries are semantically the same.table1.common_id不可为空时,所有这些查询在语义上都是相同的。

When it is nullable, NOT IN is different, since IN (and, therefore, NOT IN ) return NULL when a value does not match anything in a list containing a NULL .当它可以为空时, NOT IN是不同的,因为当值与包含NULL的列表中的任何内容都不匹配时, IN (以及因此, NOT IN )返回NULL

This may be confusing but may become more obvious if we recall the alternate syntax for this:这可能会令人困惑,但如果我们回忆一下替代语法,这可能会变得更加明显:

common_id = ANY
(
SELECT  common_id
FROM    table1 t1
)

The result of this condition is a boolean product of all comparisons within the list.此条件的结果是列表中所有比较的布尔乘积。 Of course, a single NULL value yields the NULL result which renders the whole result NULL too.当然,单个NULL值会产生NULL结果,这也会使整个结果为NULL

We never cannot say definitely that common_id is not equal to anything from this list, since at least one of the values is NULL .我们永远不能肯定地说common_id不等于该列表中的任何内容,因为至少其中一个值是NULL

Suppose we have these data:假设我们有这些数据:

common

--
1
3

table1

--
NULL
1
2

LEFT JOIN / IS NULL and NOT EXISTS will return 3 , NOT IN will return nothing (since it will always evaluate to either FALSE or NULL ). LEFT JOIN / IS NULLNOT EXISTS将返回3NOT IN将不返回任何内容(因为它总是评估为FALSENULL )。

In MySQL , in case on non-nullable column, LEFT JOIN / IS NULL and NOT IN are a little bit (several percent) more efficient than NOT EXISTS .MySQL中,如果在不可为空的列上, LEFT JOIN / IS NULLNOT INNOT EXISTS效率高一点(百分之几)。 If the column is nullable, NOT EXISTS is the most efficient (again, not much).如果该列可以为空,则NOT EXISTS是最有效的(同样,不多)。

In Oracle , all three queries yield same plans (an ANTI JOIN ).Oracle中,所有三个查询都会产生相同的计划( ANTI JOIN )。

In SQL Server , NOT IN / NOT EXISTS are more efficient, since LEFT JOIN / IS NULL cannot be optimized to an ANTI JOIN by its optimizer.SQL Server中, NOT IN / NOT EXISTS更有效,因为LEFT JOIN / IS NULL无法通过其优化器优化为ANTI JOIN

In PostgreSQL , LEFT JOIN / IS NULL and NOT EXISTS are more efficient than NOT IN , sine they are optimized to an Anti Join , while NOT IN uses hashed subplan (or even a plain subplan if the subquery is too large to hash)PostgreSQL中, LEFT JOIN / IS NULLNOT EXISTSNOT IN更有效,因为它们被优化为Anti Join ,而NOT IN使用hashed subplan (如果子查询太大而无法散列,甚至是普通subplan计划)

If you want the world to be a two-valued boolean place, you must prevent the null (third value) case yourself.如果您希望世界成为一个双值布尔位置,您必须自己防止 null(第三个值)的情况。

Don't write IN clauses that allow nulls in the list side.不要在列表端编写允许空值的 IN 子句。 Filter them out!过滤掉它们!

common_id not in
(
  select common_id from Table1
  where common_id is not null
)

Table1 or Table2 has some null values for common_id.表 1 或表 2 的 common_id 有一些空值。 Use this query instead:请改用此查询:

select *
from Common
where common_id not in (select common_id from Table1 where common_id is not null)
and common_id not in (select common_id from Table2 where common_id is not null)

The short answer:简短的回答:

There is a NULL within the collection returned by your subquery.您的子查询返回的集合中有一个 NULL。 You can solve the problem by removing that NULL value before finishing the subquery or to use NOT EXISTS predicate instead of NOT IT , as it does it implicitly.您可以通过在完成子查询之前删除该 NULL 值或使用NOT EXISTS谓词而不是NOT IT来解决问题,因为它是隐式的。

The long answer (From T-SQL Fundamentals, Third edition, by Itzik Ben-Gan)长答案(来自 Itzik Ben-Gan 的 T-SQL Fundamentals,第三版)

This is an example: Imagine there is a order with a NULL orderid inside Sales.Orders table , so the subquery returns some integers, and a NULL value.这是一个示例:假设在 Sales.Orders 表中有一个 orderid 为 NULL 的订单,因此子查询返回一些整数和一个 NULL 值。

SELECT custid, companyname
FROM Sales.Customers
WHERE custid NOT IN(SELECT O.custid
             FROM Sales.Orders AS O);

The explanation on why the query from above returns an empty set:为什么上面的查询返回一个空集的解释:

Obviously, the culprit here is the NULL customer ID you added to the Orders table.显然,这里的罪魁祸首是您添加到 Orders 表中的NULL客户 ID。 The NULL is one of the elements returned by the subquery. NULL是子查询返回的元素之一。 Let's start with the part that does behave like you expect it to.让我们从行为与您期望的一样的部分开始。 The IN predicate returns TRUE for a customer who placed orders (for example, customer 85), because such a customer is returned by the subquery. IN 谓词为下订单的客户(例如客户 85)返回 TRUE,因为这样的客户是由子查询返回的。 The NOT operator negates the IN predicate; NOT 运算符否定IN谓词; hence, the NOT TRUE becomes FALSE , and the customer is discarded.因此, NOT TRUE变为FALSE ,客户被丢弃。 The expected behavior here is that if a customer ID is known to appear in the Orders table, you know with certainty that you do not want to return it.此处的预期行为是,如果已知客户 ID 出现在 Orders 表中,您肯定知道您不想退回它。

However (take a deep breath), if a customer ID from Customers doesn't appear in the set of non-NULL customer IDs in Orders, and there's also a NULL customer ID in Orders, you can't tell with certainty that the customer is there—and similarly you can't tell with certainty that it's not there.但是(深呼吸),如果来自 Customers 的客户 ID 没有出现在 Orders 中的非 NULL 客户 ID 集合中,并且 Orders 中也有一个NULL客户 ID,那么您无法确定该客户在那里——同样你也不能确定它不在那里。 Confused?使困惑? I hope I can clarify this explanation with an example.我希望我能用一个例子来澄清这个解释。

The IN predicate returns UNKNOWN for a customer such as 22 that does not appear in the set of known customer IDs in Orders. IN谓词为未出现在 Orders 的已知客户 ID 集中的客户(例如 22)返回UNKNOWN That's because when you compare it with known customer IDs you get FALSE, and when you compare it with a NULL you get UNKNOWN .这是因为当您将其与已知客户 ID 进行比较时,您会得到 FALSE,而当您将其与 NULL 进行比较时,您会得到UNKNOWN FALSE OR UNKNOWN yields UNKNOWN . FALSEUNKNOWN产生UNKNOWN Consider the expression 22 NOT IN (1, 2, <other non-22 values>, NULL) .考虑表达式22 NOT IN (1, 2, <other non-22 values>, NULL) This expression can be rephrased as NOT 22 IN (1, 2, …, NULL) .这个表达式可以改写为NOT 22 IN (1, 2, …, NULL) You can expand this expression to NOT (22 = 1 OR 22 = 2 OR … OR 22 = NULL) .您可以将此表达式扩展为NOT (22 = 1 OR 22 = 2 OR … OR 22 = NULL) Evaluate each individual expression in the parentheses to its truth value and you get NOT (FALSE OR FALSE OR … OR UNKNOWN) , which translates to NOT UNKNOWN , which evaluates to UNKNOWN .将括号中的每个单独的表达式计算为其真值,你会得到NOT (FALSE OR FALSE OR … OR UNKNOWN) ,它转换为NOT UNKNOWN ,其计算结果为UNKNOWN

The logical meaning of UNKNOWN here, before you apply the NOT operator, is that it can't be determined whether the customer ID appears in the set, because the NULL could represent that customer ID.这里UNKNOWN的逻辑含义是,在应用NOT运算符之前,无法确定客户 ID 是否出现在集合中,因为NULL可能代表该客户 ID。 The tricky part here is that negating the UNKNOWN with the NOT operator still yields UNKNOWN .这里棘手的部分是用NOT运算符否定UNKNOWN仍然会产生UNKNOWN This means that in a case where it is unknown whether a customer ID appears in a set, it is also unknown whether it doesn't appear in the set.这意味着在不知道客户ID是否出现在集合中的情况下,也不知道它是否没有出现在集合中。 Remember that a query filter discards rows that get UNKNOWN in the result of the predicate.请记住,查询过滤器会丢弃谓词结果中出现UNKNOWN的行。

In short, when you use the NOT IN predicate against a subquery that returns at least one NULL , the query always returns an empty set.简而言之,当您对返回至少一个NULL的子查询使用NOT IN谓词时,该查询始终返回一个空集。 So, what practices can you follow to avoid such trouble?那么,您可以遵循哪些做法来避免此类麻烦呢? First, when a column is not supposed to allow NULLs , be sure to define it as NOT NULL .首先,当一列不应该允许NULLs时,请务必将其定义为NOT NULL Second, in all queries you write, you should consider NULLs and the three-valued logic.其次,在您编写的所有查询中,您应该考虑 NULL 和三值逻辑。 Think explicitly about whether the query might process NULLs, and if so, whether SQL's treatment of NULLs is correct for you.明确考虑查询是否可能处理 NULL,如果是,SQL 对 NULL 的处理是否适合您。 When it isn't, you need to intervene.如果不是,您需要进行干预。 For example, our query returns an empty set because of the comparison with the NULL .例如,由于与NULL比较,我们的查询返回一个空集。 If you want to check whether a customer ID appears only in the set of known values, you should exclude the NULLs—either explicitly or implicitly.如果要检查客户 ID 是否仅出现在一组已知值中,则应排除 NULL — 显式或隐式。 To exclude them explicitly, add the predicate O.custid IS NOT NULL to the subquery, like this:要明确排除它们,请将谓词O.custid IS NOT NULL 添加到子查询中,如下所示:

SELECT custid, companyname
FROM Sales.Customers
WHERE custid NOT IN(SELECT O.custid
                    FROM Sales.Orders AS O
                    WHERE O.custid IS NOT NULL);

You can also exclude the NULLs implicitly by using the NOT EXISTS predicate instead of NOT IN , like this:您还可以使用NOT EXISTS谓词而不是NOT IN隐式排除 NULL,如下所示:

SELECT custid, companyname
FROM Sales.Customers AS C
WHERE NOT EXISTS
   (SELECT *
    FROM Sales.Orders AS O
    WHERE O.custid = C.custid);

Recall that unlike IN , EXISTS uses two-valued predicate logic.回想一下,与IN不同, EXISTS使用二值谓词逻辑。 EXISTS always returns TRUE or FALSE and never UNKNOWN . EXISTS总是返回TRUEFALSE而从不UNKNOWN When the subquery stumbles into a NULL in O.custid , the expression evaluates to UNKNOWN and the row is filtered out.当子查询在O.custid中偶然发现NULL时,表达式的计算结果为UNKNOWN并且该行被过滤掉。 As far as the EXISTS predicate is concerned, the NULL cases are eliminated naturally, as though they weren't there.EXISTS谓词而言, NULL情况自然会被消除,就好像它们不存在一样。 So EXISTS ends up handling only known customer IDs.所以EXISTS最终只处理已知的客户 ID。 Therefore, it's safer to use NOT EXISTS than NOT IN .因此,使用NOT EXISTSNOT IN更安全。

The information above is taken from Chapter 4 - Subqueries, T-SQL Fundamentals, Third edition以上信息取自第 4 章 - 子查询,T-SQL 基础,第三版

Just off the top of my head...就在我的头顶...

select c.commonID, t1.commonID, t2.commonID
from Common c
     left outer join Table1 t1 on t1.commonID = c.commonID
     left outer join Table2 t2 on t2.commonID = c.commonID
where t1.commonID is null 
     and t2.commonID is null

I ran a few tests and here were my results wrt @patmortech's answer and @rexem's comments.我进行了一些测试,这是@patmortech 的答案和@rexem 的评论的结果。

If either Table1 or Table2 is not indexed on commonID, you get a table scan but @patmortech's query is still twice as fast (for a 100K row master table).如果 Table1 或 Table2 没有在 commonID​​ 上建立索引,则会进行表扫描,但 @patmortech 的查询速度仍然是原来的两倍(对于 100K 行的主表)。

If neither are indexed on commonID, you get two table scans and the difference is negligible.如果两者都没有在 commonID​​ 上建立索引,则会进行两次表扫描,并且差异可以忽略不计。

If both are indexed on commonID, the "not exists" query runs in 1/3 the time.如果两者都在 commonID​​ 上建立索引,则“不存在”查询会以 1/3 的时间运行。

select *
from Common c
where not exists (select t1.commonid from table1 t1 where t1.commonid = c.commonid)
and not exists (select t2.commonid from table2 t2 where t2.commonid = c.commonid)
SELECT T.common_id
  FROM Common T
       LEFT JOIN Table1 T1 ON T.common_id = T1.common_id
       LEFT JOIN Table2 T2 ON T.common_id = T2.common_id
 WHERE T1.common_id IS NULL
   AND T2.common_id IS NULL

Let's suppose these values for common_id:让我们假设 common_id 的这些值:

Common - 1
Table1 - 2
Table2 - 3, null

We want the row in Common to return, because it doesn't exist in any of the other tables.我们希望 Common 中的行返回,因为它在任何其他表中都不存在。 However, the null throws in a monkey wrench.但是,null 会引发活动。

With those values, the query is equivalent to:使用这些值,查询等效于:

select *
from Common
where 1 not in (2)
and 1 not in (3, null)

That is equivalent to:这相当于:

select *
from Common
where not (1=2)
and not (1=3 or 1=null)

This is where the problem starts.这就是问题开始的地方。 When comparing with a null, the answer is unknown .与 null 比较时,答案是 unknown So the query reduces to所以查询减少到

select *
from Common
where not (false)
and not (false or unkown)

false or unknown is unknown: false 或 unknown 是未知的:

select *
from Common
where true
and not (unknown)

true and not unkown is also unkown: true and not unkown 也是未知数:

select *
from Common
where unknown

The where condition does not return records where the result is unkown, so we get no records back. where 条件不返回结果未知的记录,因此我们没有返回任何记录。

One way to deal with this is to use the exists operator rather than in. Exists never returns unkown because it operates on rows rather than columns.解决这个问题的一种方法是使用exists 运算符而不是in。Exists 永远不会返回unkown,因为它对行而不是列进行操作。 (A row either exists or it doesn't; none of this null ambiguity at the row level!) (一行要么存在,要么不存在;在行级别没有这种空歧义!)

select *
from Common
where not exists (select common_id from Table1 where common_id = Common.common_id)
and not exists (select common_id from Table2 where common_id = Common.common_id)

this worked for me :)这对我有用:)

select * from Common从常用中选择 *

where在哪里

common_id not in (select ISNULL(common_id,'dummy-data') from Table1) common_id 不在(从 Table1 中选择ISNULL(common_id,'dummy-data')

and common_id not in (select ISNULL(common_id,'dummy-data') from Table2)并且 common_id 不在(从 Table2 中选择ISNULL(common_id,'dummy-data')

Please follow the below example to understand the above topic:请按照以下示例来理解上述主题:

Also you can visit the following link to know Anti join您也可以访问以下链接了解反加入

select department_name,department_id from hr.departments dep
where not exists 
    (select 1 from hr.employees emp
    where emp.department_id=dep.department_id
    )
order by dep.department_name;
DEPARTMENT_NAME DEPARTMENT_ID
Benefits    160
Construction    180
Contracting 190
.......

But if we use NOT IN in that case we do not get any data.但是如果我们在这种情况下使用NOT IN ,我们就不会得到任何数据。

select Department_name,department_id from hr.departments dep 
where department_id not in (select department_id from hr.employees );

no data found没有找到数据

This is happening as ( select department_id from hr.employees ) is returning a null value and the entire query is evaluated as false.发生这种情况是因为( select department_id from hr.employees )正在返回一个空值,并且整个查询被评估为假。 We can see it if we change the SQL slightly like below and handle null values with NVL function.如果我们像下面稍微改变 SQL 并使用 NVL 函数处理空值,我们可以看到它。

select Department_name,department_id from hr.departments dep 
where department_id not in (select NVL(department_id,0) from hr.employees )

Now we are getting data:现在我们正在获取数据:

DEPARTMENT_NAME DEPARTMENT_ID
Treasury    120
Corporate Tax   130
Control And Credit  140
Shareholder Services    150
Benefits    160
....

Again we are getting data as we have handled the null value with NVL function.我们再次获取数据,因为我们已经使用 NVL 函数处理了空值。

select *,
(select COUNT(ID)  from ProductMaster where ProductMaster.CatID = CategoryMaster.ID) as coun 
from CategoryMaster

I had an example where I was looking up and because one table held the value as a double, the other as a string, they would not match (or not match without a cast).我有一个例子,我正在查找,因为一个表将值保存为双精度值,另一个保存为字符串,它们将不匹配(或不匹配没有强制转换)。 But only NOT IN .但只有NOT IN As SELECT ... IN ... worked.作为SELECT ... IN ...工作。 Weird, but thought I would share in case anyone else encounters this simple fix.很奇怪,但我想我会分享以防其他人遇到这个简单的修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM