I have 3 tables:
Basically an image is downloaded and a log record is created in Pidl. After that, it's resized and a record is created in Pirl. Said record being connected to the Pidl record.
I am writing a query as to find which images need to be resized and it basically queries Pidl. The algo I've devised is simple:
for each Image in Pi {
pidlA=newest_pidl(Image);
if(pidlA.status == success) {
pirlA=newest_pirl(Image);
if(pirlA.pidl.hash != pidlA.hash)
{
go;
}
else if(pirlA.status != success){
failed_attempts = failed_pirl_count(pirlA,newest_succesful_pirl(Image))
decide based on pirlA.time and failed_attempts if go or not
}
else
{
dont go;
}
}
else
{
dont go;
}
}
And now my query(altough is not yet finished, the failed attempts part is missing, but it's already too slow, so first I'd like to fix that).
SELECT
pidl1A.pidl_id
FROM Pidl as pidl1A
LEFT JOIN Pidl as pidl2A
ON (
pidl1A.pidl_pi_id = pidl2A.pidl_pi_id AND
pidl2A.pidl_status = 1 AND
(pidl2A.pidl_time > pidl1A.pidl_time OR
(pidl2A.pidl_id > pidl1A.pidl_id and pidl1A.pidl_time=pidl2A.pidl_time)
)
)
LEFT JOIN (
#newest pirl subquery#
SELECT
pidl1B.pidl_pi_id as sub_pi_id,
pidl1B.pidl_hash as sub_pidl_hash,
pirl1B.pirl_id as sub_pirl_id,
pirl1B.pirl_status as sub_pirl_status
FROM Pirl as pirl1B
INNER JOIN Pidl as pidl1B ON (pirl1B.pirl_pidl_id = pidl1B.pidl_id)
LEFT JOIN (
SELECT
pidl2B.pidl_pi_id as sub_pi_id,
pirl2B.pirl_id as sub_pirl_id,
pirl2B.pirl_time as sub_pirl_time
FROM Pirl as pirl2B
INNER JOIN Pidl as pidl2B ON (pirl2B.pirl_pidl_id = pidl2B.pidl_id)
WHERE 1
) as pirl3B
ON (
pirl3B.sub_pi_id = pidl1B.pidl_pi_id and
(pirl3B.sub_pirl_time > pirl1B.pirl_time or
(pirl3B.sub_pirl_time = pirl1B.pirl_time and
pirl3B.sub_pirl_id > pirl1B.pirl_id)
)
)
WHERE
pirl3B.sub_pirl_id is null
) as pirl1A
ON (pirl1A.sub_pi_id = pidl1A.pidl_pi_id)
WHERE
pidl1A.pidl_status = 1 AND pidl2A.pidl_id IS NULL
AND (
pirl1A.sub_pirl_id IS NULL
OR (
pidl1A.pidl_hash != pirl1A.sub_pidl_hash
)
OR (
pirl1A.sub_pirl_status != 1
)
)
And this is my db schema:
CREATE TABLE Pi (
`pi_id` int,
PRIMARY KEY (`pi_id`)
)
;
CREATE TABLE Pidl
(
`pidl_id` int,
`pidl_pi_id` int,
`pidl_status` int,
`pidl_time` int,
`pidl_hash` varchar(16),
PRIMARY KEY (`pidl_id`)
)
;
alter table Pidl
add constraint fk1_branchNo foreign key (pidl_pi_id) references Pi (pi_id);
CREATE TABLE Pirl
(
`pirl_id` int,
`pirl_pidl_id` int,
`pirl_status` int,
`pirl_time` int,
PRIMARY KEY (`pirl_id`)
)
;
alter table Pirl
add constraint fk2_branchNo foreign key (pirl_pidl_id) references Pidl (pidl_id);
INSERT INTO Pi
(`pi_id`)
VALUES
(3),
(4),
(5);
INSERT INTO Pidl
(`pidl_id`, `pidl_pi_id`,`pidl_status`,`pidl_time`, `pidl_hash`)
VALUES
(1, 3, 1,100, 'hashA'),
(2, 3, 1,150,'hashB'),
(3, 4, 2, 200,'hashC'),
(4, 3, 1, 200,'hashA')
;
INSERT INTO Pirl
(`pirl_id`, `pirl_pidl_id`,`pirl_status`,`pirl_time`)
VALUES
(1, 2, 0,100),
(2, 3, 1,150),
(3, 1, 2, 200)
;
Of course with 3 records it's fast. But with around 10-30k it takes more than 5 seconds. What I've found is that the thing that makes it slow is the last part of the where:
AND (
pirl1A.sub_pirl_id IS NULL
OR (
pidl1A.pidl_hash != pirl1A.sub_pidl_hash
)
OR (
pirl1A.sub_pirl_status != 1
)
)
The other strange thing that I've found is that by using DISTINCT, the query got a bit faster but not fast enough.
When I read your requirements, I come up with a query like this:
select pidl.*
from pidl left join
(select image, max(pidl_time) as pidl_time
from pidl
group by image
) maxpidl
on pidl.image = maxpidl.image and pidl.pidl_time = maxpidl.pidl_time
pirl
on pidl.hash = pirl.hash
where pirl.hash is null;
I think you have some other conditions that are not fully explained (such as the role of status). You should be able to incorporate that.
In MySQL, you should avoid subqueries in the from
clause. These are materialized and -- as a result -- there is additional overhead for that work and the engine cannot subsequently use indexes.
Your queries aren't using your indexes, and are instead using views in a subquery. This can be very slow. I would suggest making new tables that are indexed with the information that you need or a materialized view.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.