How to optimize a query with inner join

Question

My mysql query is too slow and i don't know how to optimize it. My webapp cant load this query because take too much time to run and the webserver have a limit time to get the result.

    SELECT rc.trial_id,
    rc.created,
    rc.date_registration,
    rc.agemin_value,
    rc.agemin_unit,
    rc.agemax_value,
    rc.agemax_unit,
    rc.exclusion_criteria,
    rc.study_design,
    rc.expanded_access_program,
    rc.number_of_arms,
    rc.enrollment_start_actual,
    rc.target_sample_size,
    (select name from repository_institution where id = rc.primary_sponsor_id) as 
    primary_sponsor,
    (select label from vocabulary_studytype where id = rc.study_type_id) as study_type,
    (select label from vocabulary_interventionassigment where id = 
    rc.intervention_assignment_id) as intervention_assignment,
    (select label from vocabulary_studypurpose where id = rc.purpose_id) as study_purpose,  
    (select label from vocabulary_studymasking where id = rc.masking_id) as study_mask,
    (select label from vocabulary_studyallocation where id = rc.allocation_id) as 
    study_allocation,        
    (select label from vocabulary_studyphase where id = rc.phase_id) as phase,
    (select label from vocabulary_recruitmentstatus where id = rc.recruitment_status_id) as 
    recruitment_status,
    GROUP_CONCAT(vi.label) 
    FROM
    repository_clinicaltrial rc 
    inner JOIN repository_clinicaltrial_i_code rcic ON rcic.clinicaltrial_id = rc.id JOIN 
    vocabulary_interventioncode vi ON vi.id = rcic.interventioncode_id 
    GROUP BY rc.id;

Using inner join instead join could be a solution?

Answer 1

Changing to JOINs vs continuous selects per every row will definitely improve. Also, since you are using MySQL, using the keyword "STRAIGHT_JOIN" tells MySQL to do the query in the order I provided. Since your "rc" table is the primary and all the others are lookups, this will make MySQL use it in that context rather than hoping some other lookup table be the basis of the rest of the joins.

SELECT STRAIGHT_JOIN
        rc.trial_id,
        rc.created,
        rc.date_registration,
        rc.agemin_value,
        rc.agemin_unit,
        rc.agemax_value,
        rc.agemax_unit,
        rc.exclusion_criteria,
        rc.study_design,
        rc.expanded_access_program,
        rc.number_of_arms,
        rc.enrollment_start_actual,
        rc.target_sample_size,
        ri.name primary_sponsor,
        st.label study_type,
        via.label intervention_assignment,
        vsp.label study_purpose,
        vsm.label study_mask,
        vsa.label study_allocation,
        vsph.label phase,
        vrs.label recruitment_status,
        GROUP_CONCAT(vi.label) 
    FROM
        repository_clinicaltrial rc 
            JOIN repository_clinicaltrial_i_code rcic 
                ON rc.id = rcic.clinicaltrial_id
                JOIN vocabulary_interventioncode vi 
                    ON rcic.interventioncode_id = vi.id
            JOIN repository_institution ri
                on rc.primary_sponsor_id = ri.id
            JOIN vocabulary_studytype st
                on rc.study_type_id = st.id
            JOIN vocabulary_interventionassigment via 
                on rc.intervention_assignment_id = via.id
            JOIN vocabulary_studypurpose vsp 
                ON rc.purpose_id = vsp.id
            JOIN vocabulary_studymasking vsm 
                ON rc.masking_id = vsm.id
            JOIN vocabulary_studyallocation vsa 
                ON rc.allocation_id = vsa.id
            JOIN vocabulary_studyphase vsph
                ON rc.phase_id = vsph.id
            JOIN vocabulary_recruitmentstatus vrs 
                ON rc.recruitment_status_id = vrs.id 
    GROUP BY 
        rc.id;

One final note. You are using a GROUP BY and applying to the GROUP_CONCAT() which is ok. However, proper group by says you need to group by all non-aggregate columns, which in this case is every other column in the list. You may know this, and the fact the lookups will be the same based on the "rc" associated columns, but its not good practice to do so.

Answer 2

Your joins and subqueries are probably not the problem. Assuming you have correct indexes on the tables, then these are fast. "Correct indexes" means that the id column is the primary key -- a very reasonable assumption.

My guess is that the GROUP BY is the performance issue. So, I would suggest structuring the query with no `GROUP BY:

select . . .
       (select group_concat(vi.label)
        from repository_clinicaltrial_i_code rcic 
             vocabulary_interventioncode vi 
             on vi.id = rcic.interventioncode_id 
        where rcic.clinicaltrial_id = rc.id
       )
from repository_clinicaltrial rc ;

For this, you want indexes on:

repository_clinicaltrial_i_code(clinicaltrial_id, interventioncode_id)
vocabulary_interventioncode(id, label)

How to optimize a query with inner join

Question

2 answers

solution1
0 2020-11-23 12:47:06

solution2
0 2020-11-23 13:56:36

How to optimize a query with inner join

Question

2 answers

solution1 0 2020-11-23 12:47:06

solution2 0 2020-11-23 13:56:36

solution1
0 2020-11-23 12:47:06

solution2
0 2020-11-23 13:56:36