简体   繁体   中英

mysql left join with a VERY large table - super slow

[site_list] ~100,000 rows... 10mb in size.

  • site_id
  • site_url
  • site_data_most_recent_record_id

[site_list_data] ~ 15+ million rows and growing... about 600mb in size.

  • record_id
  • site_id
  • site_connect_time
  • site_speed
  • date_checked

columns in bold are unique index keys.

I need to return 50 most recently updated sites AND the recent data that goes with it - connect time, speed, date...
This is my query:

SELECT SQL_CALC_FOUND_ROWS
  site_list.site_url,
  site_list_data.site_connect_time,
  site_list_data.site_speed,
  site_list_data.date_checked
FROM site_list
  LEFT JOIN site_list_data
    ON site_list.site_data_most_recent_record_id = site_list_data.record_id
ORDER BY site_data.date_checked DESC
LIMIT 50

Without the ORDER BY and SQL_CALC_FOUND_ROWS(I need it for pagination), the query takes about 1.5 seconds, with those it takes over 2 seconds or more which is not good enough because that particular page where this data will be shown is getting 20K+ pageviews/day and this query is apparently too heavy(server almost dies when I put this live) and too slow.

Experts of mySQL, how would you do this? What if the table got to 100 million records? Caching this huge result into a temp table every 30 seconds is the only other solution I got.

You need to add a heuristic to the query. You need to gate the query to get reasonable performance. It is effectively sorting your site_list_date table by date descending -- the ENTIRE table.

So, if you know that the top 50 will be within the last day or week, add a "and date_checked > <boundary_date>" to the query. Then it should reduce the overall result set first, and THEN sort it.

SQL_CALC_ROWS_FOUND is slow use COUNT instead. Take a look here

A couple of observations.

Both ORDER BY and SQL_CALC_FOUND_ROWS are going to add to the cost of your performance. ORDER BY clauses can potentially be improved with appropriate indexing -- do you have an index on your date_checked column? This could help.

What is your exact need for SQL_CALC_FOUND_ROWS ? Consider replacing this with a separate query that uses COUNT instead. This can be vastly better assuming your Query Cache is enabled.

And if you can use COUNT , consider replacing your LEFT JOIN with an INNER JOIN as this will help performance as well.

Good luck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM