[英]mysql query optimization for large table joins
I am creating a report for radio station which generates logs of online listeners to keep records of ip, date, time, total user listening etc. 我正在为广播电台创建报告,该报告会生成在线侦听器的日志,以保留ip,日期,时间,总用户侦听等记录。
Listeners Table 听众表
client_ip date time date_time listeners
--------------- ---------- -------- ------------------- -----------
166.147.81.179 2012-04-30 00:19:46 2012-04-30 00:19:46 1
64.12.243.203 2012-04-30 04:38:37 2012-04-30 04:38:37 1
198.228.211.195 2012-04-30 05:36:33 2012-04-30 05:36:33 1
198.228.211.195 2012-04-30 05:36:34 2012-04-30 05:36:34 2
198.228.211.195 2012-04-30 05:36:35 2012-04-30 05:36:35 2
198.228.211.195 2012-04-30 05:36:35 2012-04-30 05:36:35 3
166.147.81.179 2012-04-30 05:47:13 2012-04-30 05:47:13 2
76.170.251.97 2012-04-30 06:01:37 2012-04-30 06:01:37 2
76.170.251.97 2012-04-30 06:01:39 2012-04-30 06:01:39 2
76.170.251.97 2012-04-30 06:01:39 2012-04-30 06:01:39 2
At the same time it keeps records of song details (title, artist, album, lenght, date, time) etc. 同时保留歌曲详细信息(标题,艺术家,专辑,长度,日期,时间)等记录。
Playlists Table 播放列表表
title artist length_in_secs played_date played_time start_date_time end_date_time
-------------------------- ------------------------------- -------------- ----------- ----------- ------------------- ---------------------
We Found Love Rihanna 184 2012-04-30 00:00:21 2012-04-30 00:00:21 2012-04-30 00:03:25
Photograph Nickelback 216 2012-04-30 00:03:31 2012-04-30 00:03:31 2012-04-30 00:07:07
Not Over You Gavin DeGraw 214 2012-04-30 00:07:18 2012-04-30 00:07:18 2012-04-30 00:10:52
Stereo Hearts Gym Class Heroes Ft Adam Levine 210 2012-04-30 00:10:55 2012-04-30 00:10:55 2012-04-30 00:14:25
I Gotta Feeling Black Eyed Peas 243 2012-04-30 00:15:03 2012-04-30 00:15:03 2012-04-30 00:19:06
One Thing Leads To Another Fixx 182 2012-04-30 00:19:14 2012-04-30 00:19:14 2012-04-30 00:22:16
Raise Your Glass Pink 202 2012-04-30 00:22:29 2012-04-30 00:22:29 2012-04-30 00:25:51
Better In Time Leona Lewis 216 2012-04-30 00:30:13 2012-04-30 00:30:13 2012-04-30 00:33:49
Tainted Love Soft Cell 153 2012-04-30 00:33:56 2012-04-30 00:33:56 2012-04-30 00:36:29
Haven't Met You Yet Michael Buble' 242 2012-04-30 00:37:14 2012-04-30 00:37:14 2012-04-30 00:41:16
So, the report requirement is "how many user listen the song within the date or in date range" and I write the query like this. 因此,报告要求是“有多少用户在该日期或日期范围内收听这首歌”,我这样编写查询。 It gives the output correct (as far as I know) but query execution takes time disproportional to data size - from 5 seconds to 5-10 minutes, depending on date range. 它提供正确的输出(据我所知),但是查询执行所花费的时间与数据大小不成比例-从5秒到5-10分钟不等,具体取决于日期范围。
SELECT DATE_FORMAT(p.played_date, "%m/%d/%Y") `played_date`, p.played_time, p.length_in_secs, p.title, p.artist, RTRIM(p.label) `label`, RTRIM(p.album) `album`, IFNULL((SELECT SUM(l.listeners) FROM listeners `l` WHERE l.date_time >= p.start_date_time AND l.date_time <= p.end_date_time LIMIT 1), 0) `listeners` FROM playlists `p` WHERE p.title <> "" AND (p.played_date >= '2012-04-30' AND p.played_date <= '2012-05-30') HAVING listeners > 0 ORDER BY p.title ASC;
// formatted //
SELECT
DATE_FORMAT(p.played_date, "%m/%d/%Y") `played_date`,
p.played_time,
p.length_in_secs,
p.title,
p.artist,
RTRIM(p.label) `label`,
RTRIM(p.album) `album`,
IFNULL(
(SELECT
SUM(l.listeners)
FROM
listeners `l`
WHERE l.date_time >= p.start_date_time
AND l.date_time <= p.end_date_time
LIMIT 1),
0
) `listeners`
FROM
playlists `p`
WHERE p.title <> ""
AND (
p.played_date >= '2012-04-30'
AND p.played_date <= '2012-05-30'
)
HAVING listeners > 0
ORDER BY p.title ASC
Output: 输出:
played_date played_time length_in_secs title artist label album listeners
----------- ----------- -------------- --------------------- ------------------------ ------------------ ------------------ -----------
04/30/2012 08:06:26 228 Brighter Than The Sun Colbie Caillat (Cal-Lay) Universal Republic All of You 9
04/30/2012 08:44:16 248 Breakfast At Tiffanys Deep Blue Something 6
04/30/2012 18:06:40 253 Bizarre Love Triangle New Order 2
04/30/2012 17:05:21 183 Animal Neon Trees Mercury Habits 5
04/30/2012 08:58:05 253 Always Be My Baby Mariah Carey 2
04/30/2012 07:25:52 264 Already Gone Kelly Clarkson RCA All I Ever Wante 3
04/30/2012 16:21:33 236 All The Right Moves One Republic Interscope Waking Up 7
04/30/2012 11:58:26 199 All That She Wants Ace Of Base 12
04/30/2012 11:14:17 247 All I Wanna Do Sheryl Crow 2
04/30/2012 16:12:59 235 A Thousand Miles Vanessa Carlton 5
Is there a way to optimize this query to run faster, or write a new, faster one? 有没有一种方法可以优化此查询以使其运行更快,或者编写一个新的,更快的查询? Please suggest/help me. 请建议/帮助我。 Thank you!! 谢谢!!
Using EXPLAIN 使用EXPLAIN
EXPLAIN playlists;
Field Type Null Key Default Extra
--------------- ---------------- ------ ------ ----------------- -----------------------------
playlist_id int(10) unsigned NO PRI (NULL) auto_increment
title varchar(255) YES (NULL)
artist varchar(255) YES (NULL)
label varchar(255) YES (NULL)
album varchar(255) YES (NULL)
length_in_secs int(11) NO (NULL)
played_date date NO (NULL)
played_time time NO (NULL)
start_date_time datetime NO (NULL)
end_date_time datetime NO (NULL)
added_date datetime NO (NULL)
modified_date timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
EXPLAIN listeners;
Field Type Null Key Default Extra
------------- ------------------- ------ ------ ----------------- -----------------------------
listener_id bigint(20) unsigned NO PRI (NULL) auto_increment
station_id int(10) unsigned NO (NULL)
client_ip varchar(50) NO (NULL)
time time NO (NULL)
date date NO (NULL)
date_time datetime YES (NULL)
timestamp bigint(20) unsigned NO (NULL)
listeners int(10) unsigned NO (NULL)
processes int(10) unsigned NO (NULL)
uid int(10) unsigned NO (NULL)
user_agent varchar(255) YES (NULL)
added_date datetime NO (NULL)
modified_date timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
Use INNER JOIN
instead of using correlated subquery
as: 使用INNER JOIN
代替使用correlated subquery
例如:
SELECT DATE_FORMAT(p.played_date, "%m/%d/%Y") played_date,
p.played_time,
p.length_in_secs,
p.title,
p.artist,
RTRIM(p.label) label,
RTRIM(p.album) album,
SUM(l.listeners) listeners
FROM playlists p
INNER JOIN listeners l
ON l.date_time BETWEEN p.start_date_time AND p.end_date_time
WHERE p.title <> "" AND
p.played_date BETWEEN '2012-04-30' AND '2012-05-30'
ORDER BY p.title ASC;
Consider adding following indexes on tables may help you improve performance of a query. 考虑在表上添加以下索引可能有助于您提高查询性能。 Check for the following indexes with EXPLAIN
: 使用EXPLAIN
检查以下索引:
playlists KEY (played_date, start_date_time, end_date_time, title);
listeners KEY (date_time, listeners);
As discussed in the comments, your query doesn't actually do what you want it to do. 如评论中所述,您的查询实际上并没有执行您想要的操作。 Given the data you have, I would personally process this outside of SQL to create a table to store how many people listened to each song, which you can then query in SQL to get this information. 给定您拥有的数据,我将亲自在SQL外部进行处理,以创建一个表来存储多少人听了每一首歌,然后您可以在SQL中查询以获取此信息。 If you absolutely want an SQL query to do this however, it will need to be something along the lines of this monstrocity; 但是,如果您绝对希望使用SQL查询来执行此操作,则必须遵循这种怪异的方式。
SELECT
DATE_FORMAT(p.played_date, "%m/%d/%Y") `played_date`,
p.played_time,
p.length_in_secs,
p.title,
p.artist,
RTRIM(p.label) `label`,
RTRIM(p.album) `album`,
SUM(SMALLEST(prev_listeners,next_listeners,dur_listeners) AS listeners
FROM (
SELECT
P.start_date_time,
SUBSTRING_INDEX(GROUP_CONCAT(l_before.listeners ORDER BY l_before.date_time DESC),',',1) AS prev_listeners,
SUBSTRING_INDEX(GROUP_CONCAT(l_after.listeners ORDER BY l_after.date_time ASC),',',1) AS next_listeners,
MIN(l_during) AS dur_listeners
FROM playlists p
JOIN listeners l_before ON l_before.date_time < p.start_date_time
LEFT JOIN listeners l_after ON l_before.client_ip = l_after.client_ip AND l_after.date_time > p.end_date_time
LEFT JOIN listeners l_during ON l.client_ip = l_during.client_ip AND l_during.date_time BETWEEN p.start_date_time AND p.end_date_time
WHERE p.title <> ""
AND p.played_date BETWEEN '2012-04-30' AND '2012-05-30'
GROUP BY p.start_date_time, l_before.client_ip
) l
JOIN playlists p USING (start_date_time)
GROUP BY p.start_date_time
ORDER BY p.start_date_time
Where SMALLEST is a function that returns the smallest non_null argument. 其中SMALLEST是返回最小的non_null参数的函数。
This will take considerably longer than your current query, but it's the most efficient way I can think of for getting the actual number of listeners for each song. 这将比您当前的查询花费更长的时间,但这是我想到的获取每首歌曲实际听众数量的最有效方法。
Oh, and this is assuming that the log records a row with zero listeners when everyone from an ip address stops listening, otherwise there's really no way to do this. 哦,这是假设当某个IP地址中的每个人都停止监听时,日志记录的监听者为零。否则,实际上是没有办法做到这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.