简体   繁体   中英

SQL WHERE IN ( ID1, ID2, ID3 ) vs WHERE UNIQUE_ID = IDX

I have created a table with some dummy data. The table ( ARTICLES ) consists of id, author_id, title, description and the table ( AUTHOR ) consists of author_id, name, article_list.

As per application flow first I would take out the list of authors, this will give me author name and article list and id. When the user navigates inside author I can get the list of all articles in two way.

First

SELECT * FROM  articles WHERE  author_id = 100;

and secondly, if I keep all list of all articles in form of a list inside my author table then I can use

SELECT * 
  FROM  articles 
 WHERE id IN (100, 1100, 2100, 3100, 4100, 5100, 6100, 
            7100, 8100, 9100, 10100, 11100, 12100, 13100, 
            14100, 15100, 16100, 17100, 18100, 19100, 20100,
            21100, 22100, 23100, 24100, 25100, 26100, 27100,
            28100, 29100, 30100, 31100, 32100, 33100, 34100);

The first query took 0.0329 sec while the second query took 0.0017 sec.

I am not able to understand how is it possible that the first query is taking more time than the second query.

All I know the second query will execute like

SELECT * 
  FROM articles 
 WHERE id = 100 
    OR id = 1100 
    OR id = 2100... and so on

Caching.

If you start up the server, then run a query, nothing is yet in the buffer_pool (or table cache or ...). So several files need opening and several blocks need reading. 32.9ms could indicate that you needed to hit the disk (if HDD) 3 times at about 10ms each.

If you run the identical query a second time, everything will be cached, and it will take only a few milliseconds, typically under 10ms.

Since the first query primed the cache with some stuff, the second query found most, maybe all, the blocks it needed. So, it was probably CPU-only, no I/O. 1.7ms is reasonable.

A possible issue... Do you have the "Query cache" turned on? If so, then (in certain situations) a subsequent execution of any SELECT will find the resultset in the QC and return very fast, possibly < 1ms. One way to be sure to avoid the QC (for realistic timing) is to do SELECT SQL_NO_CACHE ... .

The OR query you present is optimized into the IN that you present. That is, they end up being identical. (Using OR with different columns is a performance killer; that is not the situation here.)

Timing tips:

  • Use SELECT SQL_NO_CACHE ... (to avoid the QC)
  • Run the query twice and use the second timing. (This avoids other caching issues.)

Now to analyze what happens if you do not have any index on author_id .

  • Both queries will scan the entire table.
  • Apparently it is a relatively small table since you could read it all in 32.9ms.
  • The diff between 32.9 and 1.7 is just caching. (I have a Rule of Thumb that says that the ratio between those is 10. For this metric, 32.9/1.7 is close enough to my RoT to support it.)

If you had INDEX(author_id) , both queries would probably run faster, cached or not.

This can be because there can be thousands of author_ids and for:

 SELECT * FROM  articles WHERE  author_id = 100;

Every row needs to be traversed because its applied on the entire column rows

And for:

 SELECT * FROM  articles WHERE id IN (100, 1100, 2100, 3100, 4100, 5100, 6100, 7100, 8100, 9100, 10100, 11100, 12100, 13100, 14100, 15100, 16100, 17100, 18100, 19100, 20100, 21100, 22100, 23100, 24100, 25100, 26100, 27100, 28100, 29100, 30100, 31100, 32100, 33100, 34100);

They are limited records and a single check of id can help in faster traversing by using memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM