简体   繁体   中英

How to Optimize SELECT INTO OUTFILE query with LEFT JOIN in MySQL

I am running below query on a table which has 4.45 Million Rows, and the query is taking more than 15-20 minutes to complete the operation. I've tried changing the Engine from Innodb to MyISAM as well but nothing is working. I've also tried to add multiple indexes with type normal and unique but It still takes the same time.

Here is my Query:

SELECT 
a.source, a.destination, a.forward_to, a.start_epoch, a.end_epoch, a.duration, a.billsec, a.outbound_billsec, a.pool_id, a.group_id, a.cost, a.outbound_cost, a.net, a.keep, a.payin, a.payout, a.campaign_id, a.buyer, a.hangup_cause, a.endpoint_disposition, a.uuid, a.agreement, a.agreement_type, a.contract, a.contract_type, a.sip_received_ip,a.termination_ip, 
REPLACE(REPLACE(ifnull(b.line_type,''),'\n',' '),'\r',' ') AS line_type, 
REPLACE(REPLACE(ifnull(b.ocn,''),'\n',' '),'\r',' ') AS ocn, 
REPLACE(REPLACE(ifnull(b.spid_carrier_name,''),'\n',' '),'\r',' ') AS spid_carrier_name 
INTO OUTFILE '/tmp/test-husnain01' 
FIELDS TERMINATED BY ',' FROM inbound_022018 a 
LEFT JOIN wireless_checks b ON (a.uuid = b.uuid) 
WHERE date(a.start_epoch)='2018-02-19' AND 
a.endpoint_disposition='ANSWER' AND 
a.direction='inbound' AND 
a.billed=1;

Below is my Table Structure (inbound_022018):

      CREATE TABLE `inbound_022018` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `source` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `destination` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `prefix` int(22) NOT NULL,
        `forward_to` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `supplier` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `agreement` int(11) NOT NULL,
        `agreement_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `payout` float(11,4) NOT NULL,
        `pool_id` int(11) NOT NULL,
        `group_id` int(11) NOT NULL,
        `campaign_id` bigint(22) NOT NULL,
        `lead` int(1) NOT NULL,
        `cpl` float(11,4) NOT NULL,
        `buyer` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `contract` int(11) NOT NULL,
        `contract_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `payin` float(11,4) NOT NULL,
        `gross` float(11,4) NOT NULL,
        `cost` float(11,4) NOT NULL,
        `outbound_cost` float(11,4) NOT NULL,
        `net` float(11,4) NOT NULL,
        `keep` float(11,4) NOT NULL,
        `direction` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `session_id` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `sip_from_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `sip_received_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `domain_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `sip_req_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `endpoint_disposition` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `hangup_cause` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `hangup_cause_q850` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `start_epoch` datetime DEFAULT NULL,
        `answer_epoch` datetime DEFAULT NULL,
        `bridge_epoch` datetime DEFAULT NULL,
        `progress_epoch` datetime DEFAULT NULL,
        `progress_media_epoch` datetime NOT NULL,
        `end_epoch` datetime NOT NULL,
        `digits_dialed` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `last_app` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `last_arg` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `duration` int(11) NOT NULL,
        `g30` int(1) DEFAULT NULL,
        `billsec` int(11) NOT NULL,
        `outbound_duration` int(11) NOT NULL,
        `outbound_billsec` int(11) NOT NULL,
        `progresssec` int(11) NOT NULL,
        `answersec` int(11) NOT NULL,
        `waitsec` int(11) NOT NULL,
        `progress_mediasec` int(11) NOT NULL,
        `flow_billsec` int(11) NOT NULL,
        `sip_hangup_disposition` int(11) NOT NULL,
        `callForwarded` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `forwardUuid` varchar(40) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `call_type` enum('s','v') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 's',
        `billed` int(1) NOT NULL,
        `uc` int(1) NOT NULL,
        `suc` int(1) NOT NULL,
        `callinfo` varchar(250) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `termination_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `switchname` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `org_charges` float(11,4) NOT NULL,
        `call_summary` text,
        PRIMARY KEY (`id`),
        UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,
        UNIQUE KEY `index_uuid` (`uuid`) USING BTREE,
        UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,
        KEY `index_source` (`source`) USING BTREE,
        KEY `index_destination` (`destination`) USING BTREE,
        KEY `index_endpoint` (`endpoint_disposition`) USING BTREE,
        KEY `index_build` (`billed`) USING BTREE,
        KEY `index_campainid` (`campaign_id`) USING BTREE
      ) ENGINE=MyISAM AUTO_INCREMENT=4457485 DEFAULT CHARSET=latin1

Here is the second table (wireless_checks):

         CREATE TABLE `wireless_checks` (
        `id` int(22) NOT NULL AUTO_INCREMENT,
        `date` varchar(10) NOT NULL,
        `uuid` varchar(100) NOT NULL,
        `tn` varchar(11) NOT NULL,
        `lrn` varchar(11) NOT NULL,
        `ported_status` varchar(2) NOT NULL,
        `ported_date` varchar(11) NOT NULL,
        `ocn` varchar(10) NOT NULL,
        `line_type` int(1) NOT NULL,
        `spid` varchar(10) NOT NULL,
        `spid_carrier_name` varchar(100) NOT NULL,
        `spid_carrier_type` varchar(10) NOT NULL,
        `altspid_carrier_name` varchar(10) NOT NULL,
        `altspid_carrier_type` varchar(10) NOT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `index_uuid` (`uuid`) USING BTREE
      ) ENGINE=MyISAM AUTO_INCREMENT=36175 DEFAULT CHARSET=latin1

Please guide me how I can optimize this query to reduce the execution time. I am also open to workaround if there is any other approach to get this done. Any help will be appreciated.

Thanks

Husnain

One tip that should make a difference is that instead of doing

WHERE date(a.start_epoch)='2018-02-19'

you should consider calculating that beforehand and then using the real value, ie 1518998400

The reason this is a red flag is that by putting a function on the left side of a comparison, you're forcing the database to do a full table scan, running that function on all 4.45m rows, just to process the WHERE clause. If instead you compare the column itself to the real value, without using the DATE function, then MySQL can optimize the query far more effectively, and will use an index on a.start_epoch if one is available.

To create that index just do

CREATE INDEX epoch_idx on inbound_022018(start_epoch)

More broadly, you should create indexes against columns which have a large spread of values (not just 1 or 2 possibilities), and multi-column indexes can help with optimizing complex queries.

Putting EXPLAIN in front of the query, and having a look at the results for especially large row numbers, is a good way of establishing where the cost is in the query. Frequently, effective indexing will resolve the problem.

SELECT INTO OUTFILE is not the issue. Numerous other things are slowing the query.

Here are the snippets that I need to discuss:

    FROM  inbound_022018 a
    LEFT JOIN  wireless_checks b  ON (a.uuid = b.uuid)
    WHERE  date(a.start_epoch)='2018-02-19'
      AND  a.endpoint_disposition='ANSWER'
      AND  a.direction='inbound'
      AND  a.billed=1;

    `uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,

    `uuid` varchar(100) NOT NULL ... DEFAULT CHARSET=latin1

    float(11,4)

    `date` varchar(10) NOT NULL, ...
    `ported_date` varchar(11) NOT NULL,

    PRIMARY KEY (`id`),
    UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,

    PRIMARY KEY (`id`), ...
    UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,

Many problems:

  • UUIDs are notorously "random". How big are your tables? If they are bigger than be cached in RAM, the query is destined to be miserably slow.
  • When comparing two strings ( a.uuid = b.uuid ), indexes cannot be used if the charset or collation is different. Fix that.
  • Even smaller would be to convert from strings to BINARY(16) . (Code available elsewhere.)
  • UUIDs, unless you have something special, can be `CHAR(26) CHARSET ascii. This cleans up several things.
  • Table a needs a composite INDEX(billed, direction, endpoint_disposition, start_epoch) to make the WHERE more efficient. The first 3 columns may be in any order.
  • Change the date test as noted below.
  • A PRIMARY KEY is a UNIQUE key; remove the latter.
  • FLOAT(m,n) is a useless construct because it involves two roundings. For monetary values, use DECIMAL(m,n) ; for 'scientific' values, use FLOAT without the (m,n) .
  • It is almost never valid to have a secondary key start with all of the columns of the PK. (OK, MyISAM may benefit, but InnoDB rarely does.)
  • If you don't need b.id for anything anywhere, get rid of it and promote uuid to be the PK. This will speed up the JOIN for InnoDB.
  • Unless you have some good reason, don't put dates in VARCHAR .
  • Don't use MyISAM; fix the issues I have discussed here. Then come back for further discussion if needed.

When a column is 'hidden' inside a function (eg, DATE() ), indexing the column fails to help. Change to

WHERE  a.start_epoch >= '2018-02-19'
  AND  a.start_epoch  < '2018-02-19' + INTERVAL 1 DAY

With that change, the 4th column in my suggested INDEX will be usable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM