简体   繁体   中英

Difference between these two SQL queries

So I'm testing 2 queries and I'm getting different results. I want to correct/patch up my understanding. Here's two generic SQL queries that to my understanding are the same but when executed get different results . Note this is not a question about diff between ANSI and non-ANSI SQL .

Query 1 (using LEFT JOIN ):

SELECT * FROM person p LEFT JOIN person_log pl
ON p.person_id = pl.person_id
WHERE pl.person_id IS NULL
AND p.is_active = 1;

Query 2 (using 2 queries):

SELECT * FROM person
WHERE person.is_active = 1
AND person_id NOT IN (SELECT person_id FROM person_log);

To my understanding, both represent this in venn diagram form . Also, is one more efficient than the other? A query on JOIN results vs 2 queries?

EDIT: Changed = to IS in query 1. Thanks to @Justin Samuel for spotting the = error that's causing different results!

There is one bug in the above query 1. You cannot use "=" to check whether it is NULL

SELECT * FROM person p LEFT JOIN person_log pl
ON p.person_id = pl.person_id
WHERE pl.person_id = NULL
AND p.is_active = 1;

Ideally you should be using IS NULL

SELECT * FROM person p LEFT JOIN person_log pl
ON p.person_id = pl.person_id
WHERE pl.person_id IS NULL
AND p.is_active = 1;

You can review the NULL checks in the https://www.simple-talk.com/sql/t-sql-programming/how-to-get-nulls-horribly-wrong-in-sql-server/

Both queries get you the same data.

The second query is the straight-forward way to the problem; get all persons that have no entry in person_log. You can do the same with a NOT EXISTS clause instead of a NOT IN clause. ( NOT IN is a bit leaner, but the values you select in the subquery must not be null, for otherwise you see no data at all. I usually perfer IN / NOT IN over EXISTS / NOT EXISTS for their simplicity, but that's a matter of personal preference.

The first query is called an anti join. It is a trick to achieve the same as a NOT EXISTS or NOT IN query on weak database systems that don't implement these methods well. (The reason is that when a new database system is written, the programmers usually put all their effort in joins for they are so important and neglect EXISTS and IN for some time.)

It depends on the DBMS which gets executed fastest, NOT IN , NOT EXISTS or the anti join. The ideal DBMS would get to the same execution plan, no matter which syntax you choose.

The anti join can produce large intermediate results. With a mature DBMS you shouldn't use anti joins for this reason and for mere readability.

如果要在第二张表中查找不匹配的行,特别是如果列不可为空,则使用NOT IN。

The first one gets all the persons that have no log entries and then filters out the inactive ones.

The second query gets all the persons. Then filters out the inactive. Then gets all the log entries. Then filters out the persons that have no log entries.

They will both return the same info. However, for performance reasons, queries like the second one should be avoided if using a JOIN is possible. The main benefit of the JOINs is indexes. Only one index in the WHERE will be used, but each JOIN will use one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM