简体   繁体   中英

MySQL Slow query with multiple joins and subqueries

I have 3 tables:

  • Pi - images
  • Pidl - images dl log => Pidl
  • Pirl - images resize log => Pidl

Basically an image is downloaded and a log record is created in Pidl. After that, it's resized and a record is created in Pirl. Said record being connected to the Pidl record.

I am writing a query as to find which images need to be resized and it basically queries Pidl. The algo I've devised is simple:

for each Image in Pi {
    pidlA=newest_pidl(Image);
    if(pidlA.status == success) {
        pirlA=newest_pirl(Image);
        if(pirlA.pidl.hash != pidlA.hash)
        {
            go;
        }
        else if(pirlA.status != success){
            failed_attempts = failed_pirl_count(pirlA,newest_succesful_pirl(Image))
            decide based on pirlA.time and failed_attempts if go or not
        }
        else
        {
            dont go;
        }
    }
    else
    {
        dont go;
    }
}

And now my query(altough is not yet finished, the failed attempts part is missing, but it's already too slow, so first I'd like to fix that).

SELECT 
pidl1A.pidl_id

FROM Pidl as pidl1A

LEFT JOIN Pidl as pidl2A
ON (
    pidl1A.pidl_pi_id = pidl2A.pidl_pi_id AND 
    pidl2A.pidl_status = 1 AND
    (pidl2A.pidl_time > pidl1A.pidl_time OR 
        (pidl2A.pidl_id > pidl1A.pidl_id and pidl1A.pidl_time=pidl2A.pidl_time)
    )
) 

LEFT JOIN (
    #newest pirl subquery#
    SELECT 
    pidl1B.pidl_pi_id as sub_pi_id, 
    pidl1B.pidl_hash as sub_pidl_hash,
    pirl1B.pirl_id as sub_pirl_id,
    pirl1B.pirl_status as sub_pirl_status
    FROM Pirl as pirl1B 

    INNER JOIN Pidl as pidl1B ON (pirl1B.pirl_pidl_id = pidl1B.pidl_id)

    LEFT JOIN (
        SELECT
        pidl2B.pidl_pi_id as sub_pi_id,
        pirl2B.pirl_id as sub_pirl_id,
        pirl2B.pirl_time as sub_pirl_time
        FROM Pirl as pirl2B 
        INNER JOIN Pidl as pidl2B ON (pirl2B.pirl_pidl_id = pidl2B.pidl_id)
        WHERE 1
    ) as pirl3B 
    ON (
        pirl3B.sub_pi_id = pidl1B.pidl_pi_id and 
        (pirl3B.sub_pirl_time > pirl1B.pirl_time or
            (pirl3B.sub_pirl_time = pirl1B.pirl_time and
            pirl3B.sub_pirl_id > pirl1B.pirl_id)
        )
    )

    WHERE 
    pirl3B.sub_pirl_id is null
) as pirl1A
ON (pirl1A.sub_pi_id = pidl1A.pidl_pi_id)

WHERE 
pidl1A.pidl_status = 1 AND pidl2A.pidl_id IS NULL
AND (
    pirl1A.sub_pirl_id IS NULL
    OR (
        pidl1A.pidl_hash !=  pirl1A.sub_pidl_hash
    )
    OR (
        pirl1A.sub_pirl_status != 1
    )
)

And this is my db schema:

CREATE TABLE Pi (
  `pi_id` int,
   PRIMARY KEY (`pi_id`)
  )
;

CREATE TABLE Pidl
    (
      `pidl_id` int,
      `pidl_pi_id` int,
      `pidl_status` int,
      `pidl_time` int,
     `pidl_hash` varchar(16),
   PRIMARY KEY (`pidl_id`)
    )
;

alter table Pidl
  add constraint fk1_branchNo foreign key (pidl_pi_id) references Pi (pi_id);

CREATE TABLE Pirl
    (
      `pirl_id` int,
      `pirl_pidl_id` int,
      `pirl_status` int,
      `pirl_time` int,
   PRIMARY KEY (`pirl_id`)
    )
;

alter table Pirl
  add constraint fk2_branchNo foreign key (pirl_pidl_id) references Pidl (pidl_id);

INSERT INTO Pi
  (`pi_id`)
  VALUES
  (3),
  (4),
  (5);

INSERT INTO Pidl
    (`pidl_id`, `pidl_pi_id`,`pidl_status`,`pidl_time`, `pidl_hash`)
VALUES
    (1, 3, 1,100, 'hashA'),
    (2, 3, 1,150,'hashB'),
    (3, 4, 2, 200,'hashC'),
    (4, 3, 1, 200,'hashA')
;

INSERT INTO Pirl
    (`pirl_id`, `pirl_pidl_id`,`pirl_status`,`pirl_time`)
VALUES
    (1, 2, 0,100),
    (2, 3, 1,150),
    (3, 1, 2, 200)
;

Of course with 3 records it's fast. But with around 10-30k it takes more than 5 seconds. What I've found is that the thing that makes it slow is the last part of the where:

AND (
    pirl1A.sub_pirl_id IS NULL
    OR (
        pidl1A.pidl_hash !=  pirl1A.sub_pidl_hash
    )
    OR (
        pirl1A.sub_pirl_status != 1
    )
)

The other strange thing that I've found is that by using DISTINCT, the query got a bit faster but not fast enough.

When I read your requirements, I come up with a query like this:

select pidl.*
from pidl left join
     (select image, max(pidl_time) as pidl_time
      from pidl
      group by image
     ) maxpidl
     on pidl.image = maxpidl.image and pidl.pidl_time = maxpidl.pidl_time
     pirl
     on pidl.hash = pirl.hash
where pirl.hash is null;

I think you have some other conditions that are not fully explained (such as the role of status). You should be able to incorporate that.

In MySQL, you should avoid subqueries in the from clause. These are materialized and -- as a result -- there is additional overhead for that work and the engine cannot subsequently use indexes.

Your queries aren't using your indexes, and are instead using views in a subquery. This can be very slow. I would suggest making new tables that are indexed with the information that you need or a materialized view.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM