I am running below query on a table which has 4.45 Million Rows, and the query is taking more than 15-20 minutes to complete the operation. I've tried changing the Engine from Innodb to MyISAM as well but nothing is working. I've also tried to add multiple indexes with type normal and unique but It still takes the same time.
Here is my Query:
SELECT
a.source, a.destination, a.forward_to, a.start_epoch, a.end_epoch, a.duration, a.billsec, a.outbound_billsec, a.pool_id, a.group_id, a.cost, a.outbound_cost, a.net, a.keep, a.payin, a.payout, a.campaign_id, a.buyer, a.hangup_cause, a.endpoint_disposition, a.uuid, a.agreement, a.agreement_type, a.contract, a.contract_type, a.sip_received_ip,a.termination_ip,
REPLACE(REPLACE(ifnull(b.line_type,''),'\n',' '),'\r',' ') AS line_type,
REPLACE(REPLACE(ifnull(b.ocn,''),'\n',' '),'\r',' ') AS ocn,
REPLACE(REPLACE(ifnull(b.spid_carrier_name,''),'\n',' '),'\r',' ') AS spid_carrier_name
INTO OUTFILE '/tmp/test-husnain01'
FIELDS TERMINATED BY ',' FROM inbound_022018 a
LEFT JOIN wireless_checks b ON (a.uuid = b.uuid)
WHERE date(a.start_epoch)='2018-02-19' AND
a.endpoint_disposition='ANSWER' AND
a.direction='inbound' AND
a.billed=1;
Below is my Table Structure (inbound_022018):
CREATE TABLE `inbound_022018` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`source` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`destination` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`prefix` int(22) NOT NULL,
`forward_to` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`supplier` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`agreement` int(11) NOT NULL,
`agreement_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`payout` float(11,4) NOT NULL,
`pool_id` int(11) NOT NULL,
`group_id` int(11) NOT NULL,
`campaign_id` bigint(22) NOT NULL,
`lead` int(1) NOT NULL,
`cpl` float(11,4) NOT NULL,
`buyer` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`contract` int(11) NOT NULL,
`contract_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`payin` float(11,4) NOT NULL,
`gross` float(11,4) NOT NULL,
`cost` float(11,4) NOT NULL,
`outbound_cost` float(11,4) NOT NULL,
`net` float(11,4) NOT NULL,
`keep` float(11,4) NOT NULL,
`direction` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`session_id` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sip_from_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sip_received_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`domain_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sip_req_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`endpoint_disposition` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`hangup_cause` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`hangup_cause_q850` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`start_epoch` datetime DEFAULT NULL,
`answer_epoch` datetime DEFAULT NULL,
`bridge_epoch` datetime DEFAULT NULL,
`progress_epoch` datetime DEFAULT NULL,
`progress_media_epoch` datetime NOT NULL,
`end_epoch` datetime NOT NULL,
`digits_dialed` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`last_app` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`last_arg` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`duration` int(11) NOT NULL,
`g30` int(1) DEFAULT NULL,
`billsec` int(11) NOT NULL,
`outbound_duration` int(11) NOT NULL,
`outbound_billsec` int(11) NOT NULL,
`progresssec` int(11) NOT NULL,
`answersec` int(11) NOT NULL,
`waitsec` int(11) NOT NULL,
`progress_mediasec` int(11) NOT NULL,
`flow_billsec` int(11) NOT NULL,
`sip_hangup_disposition` int(11) NOT NULL,
`callForwarded` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`forwardUuid` varchar(40) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`call_type` enum('s','v') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 's',
`billed` int(1) NOT NULL,
`uc` int(1) NOT NULL,
`suc` int(1) NOT NULL,
`callinfo` varchar(250) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`termination_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`switchname` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`org_charges` float(11,4) NOT NULL,
`call_summary` text,
PRIMARY KEY (`id`),
UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,
UNIQUE KEY `index_uuid` (`uuid`) USING BTREE,
UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,
KEY `index_source` (`source`) USING BTREE,
KEY `index_destination` (`destination`) USING BTREE,
KEY `index_endpoint` (`endpoint_disposition`) USING BTREE,
KEY `index_build` (`billed`) USING BTREE,
KEY `index_campainid` (`campaign_id`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=4457485 DEFAULT CHARSET=latin1
Here is the second table (wireless_checks):
CREATE TABLE `wireless_checks` (
`id` int(22) NOT NULL AUTO_INCREMENT,
`date` varchar(10) NOT NULL,
`uuid` varchar(100) NOT NULL,
`tn` varchar(11) NOT NULL,
`lrn` varchar(11) NOT NULL,
`ported_status` varchar(2) NOT NULL,
`ported_date` varchar(11) NOT NULL,
`ocn` varchar(10) NOT NULL,
`line_type` int(1) NOT NULL,
`spid` varchar(10) NOT NULL,
`spid_carrier_name` varchar(100) NOT NULL,
`spid_carrier_type` varchar(10) NOT NULL,
`altspid_carrier_name` varchar(10) NOT NULL,
`altspid_carrier_type` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_uuid` (`uuid`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=36175 DEFAULT CHARSET=latin1
Please guide me how I can optimize this query to reduce the execution time. I am also open to workaround if there is any other approach to get this done. Any help will be appreciated.
Thanks
Husnain
One tip that should make a difference is that instead of doing
WHERE date(a.start_epoch)='2018-02-19'
you should consider calculating that beforehand and then using the real value, ie 1518998400
The reason this is a red flag is that by putting a function on the left side of a comparison, you're forcing the database to do a full table scan, running that function on all 4.45m rows, just to process the WHERE
clause. If instead you compare the column itself to the real value, without using the DATE
function, then MySQL can optimize the query far more effectively, and will use an index on a.start_epoch
if one is available.
To create that index just do
CREATE INDEX epoch_idx on inbound_022018(start_epoch)
More broadly, you should create indexes against columns which have a large spread of values (not just 1 or 2 possibilities), and multi-column indexes can help with optimizing complex queries.
Putting EXPLAIN
in front of the query, and having a look at the results for especially large row numbers, is a good way of establishing where the cost is in the query. Frequently, effective indexing will resolve the problem.
SELECT INTO OUTFILE
is not the issue. Numerous other things are slowing the query.
Here are the snippets that I need to discuss:
FROM inbound_022018 a
LEFT JOIN wireless_checks b ON (a.uuid = b.uuid)
WHERE date(a.start_epoch)='2018-02-19'
AND a.endpoint_disposition='ANSWER'
AND a.direction='inbound'
AND a.billed=1;
`uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`uuid` varchar(100) NOT NULL ... DEFAULT CHARSET=latin1
float(11,4)
`date` varchar(10) NOT NULL, ...
`ported_date` varchar(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,
PRIMARY KEY (`id`), ...
UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,
Many problems:
a.uuid = b.uuid
), indexes cannot be used if the charset or collation is different. Fix that. BINARY(16)
. (Code available elsewhere.) a
needs a composite INDEX(billed, direction, endpoint_disposition, start_epoch)
to make the WHERE
more efficient. The first 3 columns may be in any order. PRIMARY KEY
is a UNIQUE
key; remove the latter. FLOAT(m,n)
is a useless construct because it involves two roundings. For monetary values, use DECIMAL(m,n)
; for 'scientific' values, use FLOAT
without the (m,n)
. b.id
for anything anywhere, get rid of it and promote uuid
to be the PK. This will speed up the JOIN
for InnoDB. VARCHAR
. When a column is 'hidden' inside a function (eg, DATE()
), indexing the column fails to help. Change to
WHERE a.start_epoch >= '2018-02-19'
AND a.start_epoch < '2018-02-19' + INTERVAL 1 DAY
With that change, the 4th column in my suggested INDEX
will be usable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.