简体   繁体   中英

MySQL LEFT JOIN or WHERE IN SUBQUERY

I need a piece of advice, building an app now and I need to run some queries on rather large tables, possibly at a very frequent rate, so I'm trying to get the best approach performance wise.

I have the following 2 tables:

Albums:

+---------------+--------------+------+-----+---------+----------------+
|     Field     |     Type     | Null | Key | Default |     Extra      |
+---------------+--------------+------+-----+---------+----------------+
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| eventid       | int(11)      | NO   | MUL | NULL    |                |
| album         | varchar(200) | NO   |     | NULL    |                |
| filename      | varchar(200) | NO   |     | NULL    |                |
| obstacle_time | time         | NO   |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+

and keywords:

+-------------+--------------+------+-----+---------+----------------+
|    Field    |     Type     | Null | Key | Default |     Extra      |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| eventid     | int(11)      | NO   | MUL | NULL    |                |
| filename    | varchar(200) | NO   |     | NULL    |                |
| bibnumbers  | varchar(200) | NO   |     | NULL    |                |
| gender      | varchar(20)  | YES  |     | NULL    |                |
| top_style   | varchar(20)  | YES  |     | NULL    |                |
| pants_style | varchar(20)  | YES  |     | NULL    |                |
| other       | varchar(20)  | YES  |     | NULL    |                |
| cap         | varchar(200) | NO   |     | NULL    |                |
| tshirt      | varchar(200) | NO   |     | NULL    |                |
| pants       | varchar(200) | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

Both table have a unique_index declared which is a constraint of the eventid+filename column.

Both table contains information about some images, but the albums table is available instantly (as soon as I have the images), while the keywords table usually becomes available several days later after a manual tagging of the images is completed

Now I will have people searching for all kind of things once the tagging is enabled, but since the results can be HUGE (up to 10.000 or more) I'm only showing them in small chunks so the browser doesn't get killed with trying to load a huge amount of images, because of this my server will be hit with loads of query requests (every time the visitor scrolls to the bottom of the page, an ajax query will return the next chunk of images).

Now my question is, which of the following queries is better performance wise:

SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
    ON `basket`.`eventid` = `albums`.`eventid`
        AND `basket`.`fileid` = `albums`.`id`
        AND `basket`.`visitor_id` = 1
LEFT JOIN `keywords`
    ON `keywords`.`eventid` = `albums`.`eventid`
        AND `albums`.`filename` = `keywords`.`filename`
WHERE
    `albums_2015`.`eventid` = 1
    AND `album` LIKE '%string%'
    AND `obstacle_time` >= '08:00:00'
    AND `obstacle_time` <= '14:11:10'
    AND `gender` = 1
    AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
    AND `cap` = '2'
    AND `tshirt` = '1'
    AND `pants` = '3'
ORDER BY `obstacle_time`
LIMIT X, 10

OR using an IN CLAUSE inside WHERE like:

SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time` 
FROM `albums` 
LEFT JOIN `basket` 
    ON `basket`.`eventid` = `albums`.`eventid` 
        AND `basket`.`fileid` = `albums`.`id` 
        AND `basket`.`visitor_id` = 1 
WHERE 
    `albums_2015`.`eventid` = 1 
    AND `album` LIKE '%string%' 
    AND `obstacle_time` >= '08:00:00' 
    AND `obstacle_time` <= '14:11:10' 
    AND `filename` IN (
        SELECT `filename` 
        FROM `keywrods` 
        WHERE
            `eventid` = 1 
            AND `gender` = 1 
            AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]' 
            AND `cap` = '2' 
            AND `tshirt` = '1' 
            AND `pants` = '3'
    )
ORDER BY `obstacle_time`
LIMIT X, 10

I had looked to similar questions but wasn't able to figure it out which is the best course of action.

My understanding so far is that:

  • Using LEFT JOIN takes advantages of INDEXING, BUT!!! if I use it I will get a full join of the tables even when I only need a significantly smaller result set, so it's almost a wast to join thousands of rows just to then filter out most of them.

  • Using IN and subquery isn't indexed??? I'm not 100% sure about this, I'm using MySQL 5.6 and to the best of my understanding since 5.6 even subqueries get automatically indexed my MySQL. I think this method has benefits when there result is significantly filtered, not sure if there will be any benefit if the subquery will return all the possible filenames.

As footnote questions:

  • Should I consider returning the whole result to the client on the first query and use client side (HTML) techniques to load the images gradually rather than re-querying the server each time?

  • Should I consider merging the 2 tables into 1, how much of a performance impact will that have? (can be tricky due to various reasons, which have no place in the question)

Thanks.

EDIT 1

Explain for JOIN query:

+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| id | select_type |     table     |  type  | possible_keys |     key      | key_len |                  ref                   | rows |                       Extra                        |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
|  1 | SIMPLE      | albums_2015   | ref    | unique_index  | unique_index | 4       | const                                  | 6475 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | basket        | ALL    | NULL          | NULL         | NULL    | NULL                                   |    2 | Using where; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | keywords_2015 | eq_ref | unique_index  | unique_index | 206     | const,mybibnumber.albums_2015.filename |    1 | Using index                                        |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+

Using WHERE IN:

+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| id | select_type |     table     |  type  | possible_keys |     key      | key_len |                  ref                   | rows |                       Extra                        |  |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
|  1 | SIMPLE      | albums_2015   | ref    | unique_index  | unique_index | 4       | const                                  | 6475 | Using where; Using temporary; Using filesort       |  |
|  1 | SIMPLE      | keywords_2015 | eq_ref | unique_index  | unique_index | 206     | const,mybibnumber.albums_2015.filename |    1 | Using where                                        |  |
|  1 | SIMPLE      | basket        | ALL    | NULL          | NULL         | NULL    | NULL                                   |    2 | Using where; Using join buffer (Block Nested Loop) |  |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+

EDIT 2

I wasn't able to set up a SQL Fiddler (keep getting error of something went wrong), so I have created a test database on one of my servers.

Address: http://188.165.217.185/phpmyadmin/ , user: temp_test , pass: test_temp

I'm still building the whole thing and I don't have all the values filled in yet, like top_style, pants_style, etc, so a more appropriate query for the test scenario will be:

WHERE IN:

SELECT `albums_2015`.`filename`, 
       `albums_2015`.`id`, 
       `obstacle_time` 
FROM   `albums_2015` 
WHERE  `albums_2015`.`eventid` = 1 
       AND `album` LIKE '%' 
       AND `obstacle_time` >= '08:00:00' 
       AND `obstacle_time` <= '14:11:10' 
       AND `filename` IN (SELECT `filename` 
                          FROM   `keywords_2015` 
                          WHERE  eventid = 1 
                                 AND 
               `bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]') 
ORDER  BY `obstacle_time` 
LIMIT  0, 10 

LEFT JOIN

SELECT `albums_2015`.`filename`,`albums_2015`.`id`,`obstacle_time`
    FROM `albums_2015`
        LEFT JOIN `keywords_2015`
        ON `keywords_2015`.`eventid` = `albums_2015`.`eventid`
            AND `albums_2015`.`filename` = `keywords_2015`.`filename`
    WHERE
        `albums_2015`.`eventid` = 1
        AND `album` LIKE '%'
        AND `obstacle_time` >= '08:00:00'
        AND `obstacle_time` <= '14:11:10'

        AND `bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]'

    ORDER BY `obstacle_time`
    LIMIT 0, 10

More a bunch of tips :

  • Join using index are the best if you have to deal with multi table query,

Don't mind adding some index to speed up your query (index take space, but on INT field it's nothing and you gain way more than you lose).


  • In case of big table, caching the data in the distant table is usually a good idea.

An insert Trigger on TAG_table that cache the displayed part in the distant table (like the tag name for the overview of albums) can help you keeping your join query at a descent frequency.


  • Be careful with REGEX , it's something that hurt badly the perf . Adding a new table to split data is a better idea (and use indexing which is native optimisation)

  • For every field in a WHERE clause of a big and frequent query you should have an index on it. If you can't put one, then your DB model is f**cked-up and need to be changed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM