简体   繁体   中英

MySQL Slow on join. Any way to speed up

I have 2 tables. 1 is music and 2 is listenTrack. listenTrack tracks the unique plays of each song. I am trying to get results for popular songs of the month. I'm getting my results but they are just taking too long. Below is my tables and query

430,000 rows

CREATE TABLE `listentrack` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `sessionId` varchar(50) NOT NULL,
    `url` varchar(50) NOT NULL,
    `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `ip` varchar(150) NOT NULL,
    `user_id` int(11) DEFAULT NULL,
     PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=731306 DEFAULT CHARSET=utf8

12500 rows

CREATE TABLE `music` (
   `music_id` int(11) NOT NULL AUTO_INCREMENT,
   `user_id` int(11) NOT NULL,
   `title` varchar(50) DEFAULT NULL,
   `artist` varchar(50) DEFAULT NULL,
   `description` varchar(255) DEFAULT NULL,
   `genre` int(4) DEFAULT NULL,
   `file` varchar(255) NOT NULL,
   `url` varchar(50) NOT NULL,
   `allow_download` int(2) NOT NULL DEFAULT '1',
   `plays` bigint(20) NOT NULL,
   `downloads` bigint(20) NOT NULL,
   `faved` bigint(20) NOT NULL,
   `dateadded` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`music_id`)
) ENGINE=MyISAM AUTO_INCREMENT=15146 DEFAULT CHARSET=utf8


SELECT COUNT(listenTrack.url) AS total, listenTrack.url 
FROM listenTrack
LEFT JOIN music ON music.url = listenTrack.url
WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0
GROUP BY listenTrack.url
ORDER BY total DESC
LIMIT 0,10

this query isn't very complex and the rows aren't too large, i don't think.

Is there any way to speed this up? Or can you suggest a better solution? This is going to be a cron job at the beggining of every month but I would also like to do by the day results as well.

Oh btw i am running this locally, over 4 min to run, but on prod it takes about 45 secs

I'm more of a SQL Server guy but these concepts should apply.

I'd add indexes:

  1. On ListenTrack, add an index with url, and date_created
  2. On Music, add an index with url

These indexes should speed the query up tremendously (I originally had the table names mixed up - fixed in the latest edit).

For the most part you should also index any column that is used in a JOIN. In your case, you should index both listentrack.url and music.url

@jeff s - An index music.date_created wouldnt help because you are running that through a function first so MySQL cannot use an index on that column. Often, you can rewrite a query so that the indexed referenced column is used statically like:

DATEDIFF(DATE(date_created),'2009-08-15') = 0

becomes

date_created >= '2009-08-15' and date_created < '2009-08-15'

This will filter down records that are from 2009-08-15 and allow any indexes on that column to be candidates. Note that MySQL might NOT use that index, it depends on other factors.

Your best bet is to make a dual index on listentrack(url, date_created) and then another index on music.url

These 2 indexes will cover this particular query.

Note that if you run EXPLAIN on this query you are still going to get a using filesort because it has to write the records to a temporary table on disk to do the ORDER BY.

In general you should always run your query under EXPLAIN to get an idea on how MySQL will execute the query and then go from there. See the EXPLAIN documentation:

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

尝试创建一个有助于连接的索引:

CREATE INDEX idx_url ON music (url);

I think I might have missed the obvious before. Why are you joining the music table at all? You do not appear to be using the data in that table at all and you are performing a left join which is not required, right? I think this table being in the query will make it much slower and will not add any value. Take all references to music out, unless the url inclusion is required, in which case you need a right join to force it to not include a row without a matching value.


I would add new indexes, as the others mention. Specifically I would add: music url listentrack date_created,url

This will improve your join a ton.

Then I would look at the query, you are forcing the system to perform work on each row of the table. It would be better to rephrase the date restriction as a range.

Not sure of the syntax off the top of my head: where '2009-08-15 00:00:00' <= date_created < 2009-08-16 00:00:00

That should allow it to rapidly use the index to locate the appropriate records. The combined two key index on music should allow it to find the records based on the date and URL. You should experiment, they might be better off going in the other direction url,date_created on the index.

The explain plan for this query should say "using index" on the right hand column for both. That means that it will not have to hit the data in the table to calculate your sums.

I would also check the memory settings that you have configured for MySQL. It sounds like you do not have enough memory allocated. Be very careful on the differences between server based settings and thread based settings. The server with a 10MB cache is pretty small, a thread with a 10MB cache can use a lot of memory quickly.

Jacob

Pre-grouping and then joining makes things a lot faster with MySQL/MyISAM. (I'm suspicious less of this is needed with other DB's)

This should perform about as fast as the non-joined version:

SELECT
   total, a.url, title
FROM
(
  SELECT COUNT(*) as total, url
  from listenTrack
  WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0
  GROUP BY url
  ORDER BY total DESC
  LIMIT 0,10
) as a
LEFT JOIN music ON music.url = a.url
;

PS - Mapping between the two tables with an id instead of a url is sound advice.

Why are you repeating the url in both tables?

Have listentrack hold a music_id instead, and join on that. Gets rid of the text search as well as the extra index.

Besides, it's arguably more correct. You're tracking the times that a particular track was listened to, not the url. What if the url changes?

After you add indexes then you may want to explore adding a new column for the date_created to be a unix_timestamp, which will make math operations quicker.

I am not certain why you have the diff function though, as it appears you are looking for all rows that were updated on a particular date.

You may want to look at your query as it seems to have an error.

If you use unit tests then you can compare the results of your query and a query using a unix timestamp instead.

you might want to add an index to the url field of both tables.

having said that, when i converted from mysql to sql server 2008, with the same queries and same database structures, the queries ran 1-3 orders of magnitude faster.

i think some of it had to do with the rdbms (mysql optimizers are not so good...) and some of it might have had to do with how the rdbms reserve system resources. although, the comparisons were made on production systems where only the db would run.

This below would probably work to speed up the query.

CREATE INDEX music_url_index ON music (url) USING BTREE; CREATE INDEX listenTrack_url_index ON listenTrack (url) USING BTREE;

You really need to know the total number of comparisons and row scans that are happening. To get that answer look at the code here of how to do that using explain http://www.siteconsortium.com/h/p1.php?id=mysql002 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM