简体   繁体   中英

Performance of queries on a data subset in MySQL

I have built a small system that gathers and analyses surveys and I need to find way to optimise performance. I noticed many of the queries have parts in common and they are performed many many times. In the example below the survey_q_id` = '180002' part is the one that changes.

SELECT COUNT(`result_q`.`id`) as result_q_c 
FROM result_q
LEFT JOIN `survey_save` ON `survey_save`.`id` = `result_q`.`ss_id` 
WHERE `result_q`.`survey_q_id` = '180002' 
AND `survey_save`.`survey_order_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) 
AND `survey_save`.`id` IN (7,9,24,26,29,30,31,33,34,38,39,41,45,46,47,50,51,52,53,64,65,68,74,76,79,81,82,87,92,93,96,115,116,117,131,148,149,150,151,153,155,156,159,160,162,165,166,168,176,179,182,186,188,190,194,201,204,208,210,211,216,221,226,235,236,232,250,266,279,280,283,287,290,298,299,304,307,308,315,317,318,330,342,356,357,358,360,366,367,370,373,374,379,380,383,390,391,394,397,402,404,405,406,408,413,415,436,446,450,458,465,467,468,469,471,473,477,479,480,481,485,489,493,494,500,504,518,521,532,536,539,542,544,568,570,574,576,579,582,584,585,586,594,595,598,621,622,629,651,652,653,657,658,665,669,674,675,684,690,696,700,708,712,715,721,722,723,724,739,740,749,756,757,761,764,765,767,770,775,776,780,787,788,792,800,805,809,813,815,818,821,824,829,830,831,832,841,843,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,982,983,986,988,991,993,996,998,1001,1003,1006,1009,1011,1013,1019,1021,1024,1027,1032,1061) 
AND `result_q_radio`.`date_save` > '2019-01-01 00:00:00'
AND `result_q_radio`.`date_save` < '2022-05-16 12:19:44'

So the set of data is limited and then I need to count how many of the partial data set has one of the columns set with an id of x. The survey_order_id and all ids are indexed. The data set is not static - result_q table grows a lot daily but when analysis is performed the range of result_q entries is read-only so no new elements are added to the analysed data set when analysis of it is allowed. So the survey X that consisted of Y number of people filling in generated many entries into the result_q table but when survey is closed these entries don't change - new entries are added but the set that consists of that one survey doesn't change.

The amount of data collected grows a lot daily and I need to be able to tell MySQL that we work on these rows only with each request. Data will be analysed in many ways so these will not be static but on a particular page call big part of many queries will work only on a specific portion of data. I assume there is some caching mechanism that may be optimising this in the background but I would appreciate if someone has any idea for improvement here.

Maybe some additional calls or preparing data set in some way before or some sort of manual caching that would temporarily gather data to be worked on so only that bit is analysed?

Thanks!

PS. Currently working on: PHP7.4 + 10.3.27-MariaDB

Thanks everyone! @Chris Haas while all INT columns used in the query were indexed there were other queries as well asking in result_q_radio table that were INT and were not indexed. I actually implemented desperately a caching table that already cut the queries that were already performed before and performance of queries already done was ~5x faster but the problem remained when user clicked to filter results which has to recalculate yet another combination of survey results. I thought, since the data is sort of read-only after the survey-gathering is done for one diagnosis, that I will make a queue system to cache most of the usual combinations and perform CRON to prepare the cache at night hours daily so not to make big load on performance. I also diagnosed my PHP itself but while I can't say it is perfect it didn't seem to be issue that should exponentially slow down when amount of records in db grew.
I also desperately consulted client and hosting company if switching from shared hosting to a dedicated server would allow performance improvement that would at least give us time to develop optimisations.

BUT after lots of stress I visited phpmyadmin and the result table and noticed it has some INT columns that are used elsewhere in queries and with few clicks I indexed them. And speed improvement was astronomical ! So all in all solution was the indexing of ALL columns in ALL tables that have INT type and that are used in where queries . So Chris Haas - thank you - your first reply, while neglected by me at first (hey I have them indexed already!) in the end turned out to be the most accurate - I did not check other tables and other queries only focused on the ones that were in this direct query I suspected of being slowest. The same solution was suggested by O. Jones - thank you!

Thanks Christoffer, The Impaler, KIKO Software, O. Jones and Rick James too!

  1. Prepared statements did not sound like worth the effort from KIKO's description
  2. I did not consider silent converting to inner join to be a viable source of problem since this would most probably be cached internally somehow in SQL but in future I will consider this solution as well as an additional optimisation method.
  3. Rick James spotted the typos - the result_q should be result_q_radio everywhere in the code - I only changed it to just q at the beginning and forgot about the remaining ones. Each question type has separate table - radios, check, matrix radios etc. and I didn't want to bring attention to why it is radio in the question as all the tables work similarly.

Thank you all and have a great day!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM