简体   繁体   中英

Optimizing the SQL Query to get data from large amount MySQL database

I am having a problem getting data from a large amount MySQL database.

With the below code it is ok to get the list of 10K patients and 5K appointments which is our test server.

However, on our live server, the number of patients is over 100K and the number of appointments is over 300K and when I run the code after a while it gives 500 error.

I need the list of the patients whose patient_treatment_status is 1 or 3 and has no appointment after one month from their last appointment. (The below code is working for small amount of patients and appointments)

How can I optimise the first database query so there will be no need the second database query in the foreach loop?

<?php
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 0);

require_once('Db.class.php');

$patients = $db->query("
SELECT
    p.id, p.first_name, p.last_name, p.phone, p.mobile,
    LatestApp.lastAppDate
FROM
    patients p
LEFT JOIN (SELECT patient_id, MAX(start_date) AS lastAppDate FROM appointments WHERE appointment_status = 4) LatestApp ON p.id = LatestApp.patient_id
WHERE
    p.patient_treatment_status = 1 OR p.patient_treatment_status = 3
ORDER BY
    p.id
");

foreach ($patients as $row) {
    $one_month_after_the_last_appointment = date('Y-m-d', strtotime($row['lastAppDate'] . " +1 month"));
    $appointment_check = $db->single("SELECT COUNT(id) FROM appointments WHERE patient_id = :pid AND appointment_status = :a0 AND (start_date >= :a1 AND start_date <= :a2)", array("pid"=>"{$row['id']}","a0"=>"1","a1"=>"{$row['lastAppDate']}","a2"=>"$one_month_after_the_last_appointment"));

    if($appointment_check == 0){
        echo $patient_id = $row['id'].' - '.$row['lastAppDate'].' - '.$one_month_after_the_last_appointment. '<br>';
    }
}
?>

First off, this subquery likely does not do what you think it does.

SELECT patient_id, MAX(start_date) AS lastAppDate 
FROM appointments WHERE appointment_status = 4

Without a GROUP BY clause, that subquery will simply take the maximum start_date of all appointments with appointment_status=4 , and then arbitrarily pick one patient_id . To get the results you want you'll need to GROUP BY patient_id .

For your overall question, try the following query:

SELECT
    p.id, p.first_name, p.last_name, p.phone, p.mobile,
    LatestApp.lastAppDate
FROM
    patients p
INNER JOIN (
    SELECT patient_id,
    MAX(start_date) AS lastAppDate
    FROM appointments
    WHERE appointment_status = 4
    GROUP BY patient_id
) LatestApp ON p.id = LatestApp.patient_id
WHERE
    (p.patient_treatment_status = 1
    OR p.patient_treatment_status = 3)
    AND NOT EXISTS (
        SELECT 1
        FROM appointments a
        WHERE a.patient_id = p.patient_id
        AND a.appointment_status = 1
        AND a.start_date >= LatestApp.lastAppDate
        AND a.start_date < DATE_ADD(LatestApp.lastAppDate,INTERVAL 1 MONTH)
)
ORDER BY
    p.id

Add the following index, if it doesn't already exist:

ALTER TABLE appointments
ADD INDEX (`patient_id`, `appointment_status`, `start_date`)

Report how this performs and if the data appears correct. Provide SHOW CREATE TABLE patient and SHOW CREATE TABLE appointments for further assistance related to performance.

Also, try the query above without the AND NOT EXISTS clause, together with the second query you use. It is possible that running 2 queries may be faster than trying to run them together, in this situation.

Note that I used an INNER JOIN to find the latest appointment. This will result in all patients that have never had an appointment to not be included in the query. If you need those added, just UNION the results those found by selecting from patients that have never had an appointment.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM