简体   繁体   中英

MySQL search query with multiple joins and subqueries running slow

I have the following query which is actually within a stored procedure, but I removed it as there is too much going on inside the stored procedure. Basically this is the end result which takes ages (more than a minute) to run and I know the reason why - as you will also see from looking at the result of the explain - but I just cannot get it sorted.

Just to quickly explain what this query is doing. It is fetching all products from companies that are "connected" to the company where li.nToObjectID = 37 . The result also returns some other information about the other companies like its name, company id, etc.

SELECT DISTINCT
    SQL_CALC_FOUND_ROWS
    p.id,
    p.sTitle,
    p.sTeaser,
    p.TimeStamp,
    p.ExpiryDate,
    p.InStoreDate,
    p.sCreator,
    p.sProductCode,
    p.nRetailPrice,
    p.nCostPrice,
    p.bPublic,
    c.id as nCompanyID,
    c.sName as sCompany,
    m.id as nMID,
    m.sFileName as sHighResFileName,
    m.nSize,
    (
        Select sName
        FROM tblBrand
        WHERE id = p.nBrandID
    ) as sBrand,
    (
        Select t.sFileName
        FROM tblThumbnail t
        where t.nMediaID = m.id AND
            t.sType = "thumbnail"
    ) as sFileName,
    (
        Select t.nWidth
        FROM tblThumbnail t
        where t.nMediaID = m.id AND
            t.sType = "thumbnail"
    ) as nWidth,
    (
        Select t.nHeight
        FROM tblThumbnail t
        where t.nMediaID = m.id AND
          t.sType = "thumbnail"
    ) as nHeight,
    IF (
      (
          SELECT COUNT(id) FROM tblLink
          WHERE
              sType = "company"
              AND sStatus = "active"
              AND nToObjectID = 37
              AND nFromObjectID = u.nCompanyID
      ),
      1,
      0
    ) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
    ON (
        m.nTypeID = p.id AND
        m.sType = "product"
    )
INNER JOIN tblUser u
    ON u.id = p.nUserID
INNER JOIN tblCompany c
    ON u.nCompanyID = c.id
LEFT JOIN tblLink li
    ON (
        li.sType = "company"
        AND li.sStatus = "active"
        AND li.nToObjectID = 37
        AND li.nFromObjectID = u.nCompanyID
    )
WHERE c.bActive = 1 
    AND p.bArchive = 0 
    AND p.bActive = 1 
AND NOW() <= p.ExpiryDate 
AND (
    li.id IS NOT NULL 
    OR (
        li.id IS NULL 
        AND p.bPublic = 1
    )
) 
ORDER BY p.TimeStamp DESC 
LIMIT 0, 52

Click here to see the output for EXPLAIN. Sorry, just couldn't get the formatting correct.

http://i60.tinypic.com/2hdqjgj.png

And lastly the number of rows for all the tables in this query:

tblProducts Count: 5392

tblBrand Count: 194

tblCompany Count: 368

tblUser Count: 416

tblMedia Count: 5724

tblLink Count: 24800

tblThumbnail Count: 22207

So I have 2 questions: 1. Is there another way of writing this query which might potentially speed it up? 2. What index combination do I need for tblProducts so that not all the rows are searched through?

UPDATE 1

This is the new query after removing the subqueries and making use of left joins instead:

SELECT DISTINCT DISTINCT
    SQL_CALC_FOUND_ROWS
    p.id,
    p.sTitle,
    p.sTeaser,
    p.TimeStamp,
    p.ExpiryDate,
    p.InStoreDate,
    p.sCreator,
    p.sProductCode,
    p.nRetailPrice,
    p.nCostPrice,
    p.bPublic,
    c.id as nCompanyID,
    c.sName as sCompany,
    m.id as nMID,
    m.sFileName as sHighResFileName,
    m.nSize,
    brand.sName as sBrand,
    thumb.sFilename,
    thumb.nWidth,
    thumb.nHeight,
    IF (
      (
          SELECT COUNT(id) FROM tblLink
          WHERE
              sType = "company"
              AND sStatus = "active"
              AND nToObjectID = 37
              AND nFromObjectID = u.nCompanyID
      ),
      1,
      0
    ) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
    ON (
        m.nTypeID = p.id AND
        m.sType = "product"
    )
INNER JOIN tblUser u
    ON u.id = p.nUserID
INNER JOIN tblCompany c
    ON u.nCompanyID = c.id
LEFT JOIN tblLink li
    ON (
        li.sType = "company"
        AND li.sStatus = "active"
        AND li.nToObjectID = 37
        AND li.nFromObjectID = u.nCompanyID
    )
LEFT JOIN tblBrand AS brand
    ON brand.id = p.nBrandID
LEFT JOIN tblThumbnail AS thumb 
    ON (
        thumb.nMediaID = m.id 
        AND thumb.sType = 'thumbnail'
    )
WHERE c.bActive = 1 
    AND p.bArchive = 0 
    AND p.bActive = 1 
AND NOW() <= p.ExpiryDate 
AND (
    li.id IS NOT NULL 
    OR (
        li.id IS NULL 
        AND p.bPublic = 1
    )
) 
ORDER BY p.TimeStamp DESC 
LIMIT 0, 52;

UPDATE 2

ALTER TABLE tblThumbnail ADD INDEX (nMediaID,sType) USING BTREE;
ALTER TABLE tblMedia ADD INDEX (nTypeID,sType) USING BTREE;
ALTER TABLE tblProduct ADD INDEX (bArchive,bActive,ExpiryDate,bPublic,TimeStamp) USING     BTREE;

After doing the above changes the explain showed that it is now only searching through 1464 rows on tblProduct instead of 5392.

That's a big query with a lot going on. It's going to take a few steps of work to optimize it. I will take the liberty of just presenting a couple of steps.

First step. Can you get rid of SQL_CALC_FOUND_ROWS and still have your program work correctly? If so, do that. When you specify SQL_CALC_FOUND_ROWS it sometimes means the server has to delay sending you the first row of your resultset until the last row is available.

Second step. Refactor the dependent subqueries to be JOINs instead.

Here's how you might approach that. Part of your query looks like this...

SELECT DISTINCT SQL_CALC_FOUND_ROWS
    p.id,
    ...
    c.id as nCompanyID,
    ...
    m.id as nMID,
    ...
    (   /* dependent subquery to be removed */
        Select sName
        FROM tblBrand
        WHERE id = p.nBrandID
    ) as sBrand,
    (   /* dependent subquery to be removed */
        Select t.sFileName
        FROM tblThumbnail t
        where t.nMediaID = m.id AND
            t.sType = "thumbnail"
    ) as sFileName,
    (   /* dependent subquery to be removed */
        Select t.nWidth
        FROM tblThumbnail t
        where t.nMediaID = m.id AND
            t.sType = "thumbnail"
    ) as nWidth,
    (   /* dependent subquery to be removed */
        Select t.nHeight
        FROM tblThumbnail t
        where t.nMediaID = m.id AND
          t.sType = "thumbnail"
    ) as nHeight,
    ...

Try this instead. Notice how the brand and thumbnail dependent subqueries disappear. You had three dependent subqueries for the thumbnail; they can disappear into a single JOIN.

SELECT DISTINCT SQL_CALC_FOUND_ROWS
      p.id,
      ...
      brand.sName,
      thumb.sFilename,
      thumb.nWidth,
      thumb.nHeight,
      ...
 FROM tblProduct p
INNER JOIN tblMedia AS m     ON (m.nTypeID = p.id AND m.sType = 'product')
     ... (other table joins) ...
 LEFT JOIN tblBrand AS brand ON p.id = p.nBrandID
 LEFT JOIN tblMedia AS thumb ON (t.nMediaID = m.id AND thumb.sType = 'thumbnail')

I used LEFT JOIN rather than INNER JOIN so MySQL will present NULL values if the joined rows are missing.

Edit

You're using a join pattern that looks like this:

 JOIN sometable AS s ON (s.someID = m.id AND s.sType = 'string')

You seem to do this for a few tables. You probably can speed up the JOIN operations by creating compound indexes in those tables. For example, try adding the following index to tblThumbnail: (sType, nMediaID). You can do that with this DDL statement.

ALTER TABLE tblThumbnail ADD INDEX  (sType, nMediaID) USING BTREE

You can do similar things to other tables with the same join pattern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM