[英]How to Optimize SELECT INTO OUTFILE query with LEFT JOIN in MySQL
I am running below query on a table which has 4.45 Million Rows, and the query is taking more than 15-20 minutes to complete the operation. 我在具有445万行的表上的查询下方运行,查询需要15到20分钟才能完成操作。 I've tried changing the Engine from Innodb to MyISAM as well but nothing is working. 我也尝试过将引擎从Innodb更改为MyISAM,但没有任何效果。 I've also tried to add multiple indexes with type normal and unique but It still takes the same time. 我也尝试添加多个具有正常和唯一类型的索引,但是仍然需要花费相同的时间。
Here is my Query: 这是我的查询:
SELECT
a.source, a.destination, a.forward_to, a.start_epoch, a.end_epoch, a.duration, a.billsec, a.outbound_billsec, a.pool_id, a.group_id, a.cost, a.outbound_cost, a.net, a.keep, a.payin, a.payout, a.campaign_id, a.buyer, a.hangup_cause, a.endpoint_disposition, a.uuid, a.agreement, a.agreement_type, a.contract, a.contract_type, a.sip_received_ip,a.termination_ip,
REPLACE(REPLACE(ifnull(b.line_type,''),'\n',' '),'\r',' ') AS line_type,
REPLACE(REPLACE(ifnull(b.ocn,''),'\n',' '),'\r',' ') AS ocn,
REPLACE(REPLACE(ifnull(b.spid_carrier_name,''),'\n',' '),'\r',' ') AS spid_carrier_name
INTO OUTFILE '/tmp/test-husnain01'
FIELDS TERMINATED BY ',' FROM inbound_022018 a
LEFT JOIN wireless_checks b ON (a.uuid = b.uuid)
WHERE date(a.start_epoch)='2018-02-19' AND
a.endpoint_disposition='ANSWER' AND
a.direction='inbound' AND
a.billed=1;
Below is my Table Structure (inbound_022018): 以下是我的表结构(inbound_022018):
CREATE TABLE `inbound_022018` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`source` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`destination` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`prefix` int(22) NOT NULL,
`forward_to` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`supplier` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`agreement` int(11) NOT NULL,
`agreement_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`payout` float(11,4) NOT NULL,
`pool_id` int(11) NOT NULL,
`group_id` int(11) NOT NULL,
`campaign_id` bigint(22) NOT NULL,
`lead` int(1) NOT NULL,
`cpl` float(11,4) NOT NULL,
`buyer` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`contract` int(11) NOT NULL,
`contract_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`payin` float(11,4) NOT NULL,
`gross` float(11,4) NOT NULL,
`cost` float(11,4) NOT NULL,
`outbound_cost` float(11,4) NOT NULL,
`net` float(11,4) NOT NULL,
`keep` float(11,4) NOT NULL,
`direction` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`session_id` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sip_from_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sip_received_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`domain_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sip_req_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`endpoint_disposition` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`hangup_cause` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`hangup_cause_q850` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`start_epoch` datetime DEFAULT NULL,
`answer_epoch` datetime DEFAULT NULL,
`bridge_epoch` datetime DEFAULT NULL,
`progress_epoch` datetime DEFAULT NULL,
`progress_media_epoch` datetime NOT NULL,
`end_epoch` datetime NOT NULL,
`digits_dialed` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`last_app` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`last_arg` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`duration` int(11) NOT NULL,
`g30` int(1) DEFAULT NULL,
`billsec` int(11) NOT NULL,
`outbound_duration` int(11) NOT NULL,
`outbound_billsec` int(11) NOT NULL,
`progresssec` int(11) NOT NULL,
`answersec` int(11) NOT NULL,
`waitsec` int(11) NOT NULL,
`progress_mediasec` int(11) NOT NULL,
`flow_billsec` int(11) NOT NULL,
`sip_hangup_disposition` int(11) NOT NULL,
`callForwarded` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`forwardUuid` varchar(40) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`call_type` enum('s','v') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 's',
`billed` int(1) NOT NULL,
`uc` int(1) NOT NULL,
`suc` int(1) NOT NULL,
`callinfo` varchar(250) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`termination_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`switchname` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`org_charges` float(11,4) NOT NULL,
`call_summary` text,
PRIMARY KEY (`id`),
UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,
UNIQUE KEY `index_uuid` (`uuid`) USING BTREE,
UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,
KEY `index_source` (`source`) USING BTREE,
KEY `index_destination` (`destination`) USING BTREE,
KEY `index_endpoint` (`endpoint_disposition`) USING BTREE,
KEY `index_build` (`billed`) USING BTREE,
KEY `index_campainid` (`campaign_id`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=4457485 DEFAULT CHARSET=latin1
Here is the second table (wireless_checks): 这是第二张表(wireless_checks):
CREATE TABLE `wireless_checks` (
`id` int(22) NOT NULL AUTO_INCREMENT,
`date` varchar(10) NOT NULL,
`uuid` varchar(100) NOT NULL,
`tn` varchar(11) NOT NULL,
`lrn` varchar(11) NOT NULL,
`ported_status` varchar(2) NOT NULL,
`ported_date` varchar(11) NOT NULL,
`ocn` varchar(10) NOT NULL,
`line_type` int(1) NOT NULL,
`spid` varchar(10) NOT NULL,
`spid_carrier_name` varchar(100) NOT NULL,
`spid_carrier_type` varchar(10) NOT NULL,
`altspid_carrier_name` varchar(10) NOT NULL,
`altspid_carrier_type` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_uuid` (`uuid`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=36175 DEFAULT CHARSET=latin1
Please guide me how I can optimize this query to reduce the execution time. 请指导我如何优化此查询以减少执行时间。 I am also open to workaround if there is any other approach to get this done. 如果还有其他方法可以解决此问题,我也很乐意解决。 Any help will be appreciated. 任何帮助将不胜感激。
Thanks 谢谢
Husnain 侯赛因
One tip that should make a difference is that instead of doing 应该有所作为的一个技巧是
WHERE date(a.start_epoch)='2018-02-19'
you should consider calculating that beforehand and then using the real value, ie 1518998400 您应该考虑预先计算,然后使用实际值,即1518998400
The reason this is a red flag is that by putting a function on the left side of a comparison, you're forcing the database to do a full table scan, running that function on all 4.45m rows, just to process the WHERE
clause. 这是一个危险信号,原因是通过将一个函数放在比较的左侧,您将迫使数据库执行全表扫描,在所有4.45m行上运行该函数,仅用于处理WHERE
子句。 If instead you compare the column itself to the real value, without using the DATE
function, then MySQL can optimize the query far more effectively, and will use an index on a.start_epoch
if one is available. 相反,如果不使用DATE
函数就将列本身与实际值进行比较,则MySQL可以更有效地优化查询,并且将在a.start_epoch
上使用索引(如果有)。
To create that index just do 要创建该索引,只需
CREATE INDEX epoch_idx on inbound_022018(start_epoch)
More broadly, you should create indexes against columns which have a large spread of values (not just 1 or 2 possibilities), and multi-column indexes can help with optimizing complex queries. 更广泛地讲,您应该针对值分布范围较大(不仅有1或2种可能性)的列创建索引,并且多列索引可以帮助优化复杂的查询。
Putting EXPLAIN
in front of the query, and having a look at the results for especially large row numbers, is a good way of establishing where the cost is in the query. 将EXPLAIN
放在查询的前面,并查看特别大的行号的结果,是确定成本在查询中的位置的好方法。 Frequently, effective indexing will resolve the problem. 通常,有效的索引编制可以解决此问题。
SELECT INTO OUTFILE
is not the issue. SELECT INTO OUTFILE
不是问题。 Numerous other things are slowing the query. 许多其他因素正在减慢查询速度。
Here are the snippets that I need to discuss: 以下是我需要讨论的片段:
FROM inbound_022018 a
LEFT JOIN wireless_checks b ON (a.uuid = b.uuid)
WHERE date(a.start_epoch)='2018-02-19'
AND a.endpoint_disposition='ANSWER'
AND a.direction='inbound'
AND a.billed=1;
`uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`uuid` varchar(100) NOT NULL ... DEFAULT CHARSET=latin1
float(11,4)
`date` varchar(10) NOT NULL, ...
`ported_date` varchar(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,
PRIMARY KEY (`id`), ...
UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,
Many problems: 很多问题:
a.uuid = b.uuid
), indexes cannot be used if the charset or collation is different. 比较两个字符串( a.uuid = b.uuid
)时,如果字符集或排序规则不同,则不能使用索引。 Fix that. 解决这个。 BINARY(16)
. 从字符串转换为BINARY(16)
甚至更小。 (Code available elsewhere.) (代码可在其他地方获得。) a
needs a composite INDEX(billed, direction, endpoint_disposition, start_epoch)
to make the WHERE
more efficient. 表a
需要复合INDEX(billed, direction, endpoint_disposition, start_epoch)
,以提高WHERE
效率。 The first 3 columns may be in any order. 前3列可以按任何顺序排列。 PRIMARY KEY
is a UNIQUE
key; PRIMARY KEY
是UNIQUE
键; remove the latter. 删除后者。 FLOAT(m,n)
is a useless construct because it involves two roundings. FLOAT(m,n)
是无用的构造,因为它涉及两个舍入。 For monetary values, use DECIMAL(m,n)
; 对于货币值,请使用DECIMAL(m,n)
; for 'scientific' values, use FLOAT
without the (m,n)
. 对于“科学”值,请使用不带(m,n)
FLOAT
。 b.id
for anything anywhere, get rid of it and promote uuid
to be the PK. 如果您在任何地方都不需要b.id
,请摆脱它并把uuid
升级为PK。 This will speed up the JOIN
for InnoDB. 这将加速InnoDB的JOIN
。 VARCHAR
. 除非您有充分的理由,否则不要将日期放在VARCHAR
。 When a column is 'hidden' inside a function (eg, DATE()
), indexing the column fails to help. 当某个列在函数(例如DATE()
)中被“隐藏”时,对该列进行索引将无济于事。 Change to 改成
WHERE a.start_epoch >= '2018-02-19'
AND a.start_epoch < '2018-02-19' + INTERVAL 1 DAY
With that change, the 4th column in my suggested INDEX
will be usable. 进行此更改后,我建议的INDEX
的第4列将可用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.