简体   繁体   English

如何在MySQL中使用LEFT JOIN优化SELECT INTO OUTFILE查询

[英]How to Optimize SELECT INTO OUTFILE query with LEFT JOIN in MySQL

I am running below query on a table which has 4.45 Million Rows, and the query is taking more than 15-20 minutes to complete the operation. 我在具有445万行的表上的查询下方运行,查询需要15到20分钟才能完成操作。 I've tried changing the Engine from Innodb to MyISAM as well but nothing is working. 我也尝试过将引擎从Innodb更改为MyISAM,但没有任何效果。 I've also tried to add multiple indexes with type normal and unique but It still takes the same time. 我也尝试添加多个具有正常和唯一类型的索引,但是仍然需要花费相同的时间。

Here is my Query: 这是我的查询:

SELECT 
a.source, a.destination, a.forward_to, a.start_epoch, a.end_epoch, a.duration, a.billsec, a.outbound_billsec, a.pool_id, a.group_id, a.cost, a.outbound_cost, a.net, a.keep, a.payin, a.payout, a.campaign_id, a.buyer, a.hangup_cause, a.endpoint_disposition, a.uuid, a.agreement, a.agreement_type, a.contract, a.contract_type, a.sip_received_ip,a.termination_ip, 
REPLACE(REPLACE(ifnull(b.line_type,''),'\n',' '),'\r',' ') AS line_type, 
REPLACE(REPLACE(ifnull(b.ocn,''),'\n',' '),'\r',' ') AS ocn, 
REPLACE(REPLACE(ifnull(b.spid_carrier_name,''),'\n',' '),'\r',' ') AS spid_carrier_name 
INTO OUTFILE '/tmp/test-husnain01' 
FIELDS TERMINATED BY ',' FROM inbound_022018 a 
LEFT JOIN wireless_checks b ON (a.uuid = b.uuid) 
WHERE date(a.start_epoch)='2018-02-19' AND 
a.endpoint_disposition='ANSWER' AND 
a.direction='inbound' AND 
a.billed=1;

Below is my Table Structure (inbound_022018): 以下是我的表结构(inbound_022018):

      CREATE TABLE `inbound_022018` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `source` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `destination` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `prefix` int(22) NOT NULL,
        `forward_to` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `supplier` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `agreement` int(11) NOT NULL,
        `agreement_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `payout` float(11,4) NOT NULL,
        `pool_id` int(11) NOT NULL,
        `group_id` int(11) NOT NULL,
        `campaign_id` bigint(22) NOT NULL,
        `lead` int(1) NOT NULL,
        `cpl` float(11,4) NOT NULL,
        `buyer` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `contract` int(11) NOT NULL,
        `contract_type` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `payin` float(11,4) NOT NULL,
        `gross` float(11,4) NOT NULL,
        `cost` float(11,4) NOT NULL,
        `outbound_cost` float(11,4) NOT NULL,
        `net` float(11,4) NOT NULL,
        `keep` float(11,4) NOT NULL,
        `direction` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `session_id` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `sip_from_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `sip_received_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `domain_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `sip_req_uri` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `endpoint_disposition` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `hangup_cause` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `hangup_cause_q850` varchar(80) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
        `start_epoch` datetime DEFAULT NULL,
        `answer_epoch` datetime DEFAULT NULL,
        `bridge_epoch` datetime DEFAULT NULL,
        `progress_epoch` datetime DEFAULT NULL,
        `progress_media_epoch` datetime NOT NULL,
        `end_epoch` datetime NOT NULL,
        `digits_dialed` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `last_app` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `last_arg` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `duration` int(11) NOT NULL,
        `g30` int(1) DEFAULT NULL,
        `billsec` int(11) NOT NULL,
        `outbound_duration` int(11) NOT NULL,
        `outbound_billsec` int(11) NOT NULL,
        `progresssec` int(11) NOT NULL,
        `answersec` int(11) NOT NULL,
        `waitsec` int(11) NOT NULL,
        `progress_mediasec` int(11) NOT NULL,
        `flow_billsec` int(11) NOT NULL,
        `sip_hangup_disposition` int(11) NOT NULL,
        `callForwarded` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `forwardUuid` varchar(40) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `call_type` enum('s','v') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 's',
        `billed` int(1) NOT NULL,
        `uc` int(1) NOT NULL,
        `suc` int(1) NOT NULL,
        `callinfo` varchar(250) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `termination_ip` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `switchname` varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
        `org_charges` float(11,4) NOT NULL,
        `call_summary` text,
        PRIMARY KEY (`id`),
        UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,
        UNIQUE KEY `index_uuid` (`uuid`) USING BTREE,
        UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,
        KEY `index_source` (`source`) USING BTREE,
        KEY `index_destination` (`destination`) USING BTREE,
        KEY `index_endpoint` (`endpoint_disposition`) USING BTREE,
        KEY `index_build` (`billed`) USING BTREE,
        KEY `index_campainid` (`campaign_id`) USING BTREE
      ) ENGINE=MyISAM AUTO_INCREMENT=4457485 DEFAULT CHARSET=latin1

Here is the second table (wireless_checks): 这是第二张表(wireless_checks):

         CREATE TABLE `wireless_checks` (
        `id` int(22) NOT NULL AUTO_INCREMENT,
        `date` varchar(10) NOT NULL,
        `uuid` varchar(100) NOT NULL,
        `tn` varchar(11) NOT NULL,
        `lrn` varchar(11) NOT NULL,
        `ported_status` varchar(2) NOT NULL,
        `ported_date` varchar(11) NOT NULL,
        `ocn` varchar(10) NOT NULL,
        `line_type` int(1) NOT NULL,
        `spid` varchar(10) NOT NULL,
        `spid_carrier_name` varchar(100) NOT NULL,
        `spid_carrier_type` varchar(10) NOT NULL,
        `altspid_carrier_name` varchar(10) NOT NULL,
        `altspid_carrier_type` varchar(10) NOT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `index_uuid` (`uuid`) USING BTREE
      ) ENGINE=MyISAM AUTO_INCREMENT=36175 DEFAULT CHARSET=latin1

Please guide me how I can optimize this query to reduce the execution time. 请指导我如何优化此查询以减少执行时间。 I am also open to workaround if there is any other approach to get this done. 如果还有其他方法可以解决此问题,我也很乐意解决。 Any help will be appreciated. 任何帮助将不胜感激。

Thanks 谢谢

Husnain 侯赛因

One tip that should make a difference is that instead of doing 应该有所作为的一个技巧是

WHERE date(a.start_epoch)='2018-02-19'

you should consider calculating that beforehand and then using the real value, ie 1518998400 您应该考虑预先计算,然后使用实际值,即1518998400

The reason this is a red flag is that by putting a function on the left side of a comparison, you're forcing the database to do a full table scan, running that function on all 4.45m rows, just to process the WHERE clause. 这是一个危险信号,原因是通过将一个函数放在比较的左侧,您将迫使数据库执行全表扫描,在所有4.45m行上运行该函数,仅用于处理WHERE子句。 If instead you compare the column itself to the real value, without using the DATE function, then MySQL can optimize the query far more effectively, and will use an index on a.start_epoch if one is available. 相反,如果不使用DATE函数就将列本身与实际值进行比较,则MySQL可以更有效地优化查询,并且将在a.start_epoch上使用索引(如果有)。

To create that index just do 要创建该索引,只需

CREATE INDEX epoch_idx on inbound_022018(start_epoch)

More broadly, you should create indexes against columns which have a large spread of values (not just 1 or 2 possibilities), and multi-column indexes can help with optimizing complex queries. 更广泛地讲,您应该针对值分布范围较大(不仅有1或2种可能性)的列创建索引,并且多列索引可以帮助优化复杂的查询。

Putting EXPLAIN in front of the query, and having a look at the results for especially large row numbers, is a good way of establishing where the cost is in the query. EXPLAIN放在查询的前面,并查看特别大的行号的结果,是确定成本在查询中的位置的好方法。 Frequently, effective indexing will resolve the problem. 通常,有效的索引编制可以解决此问题。

SELECT INTO OUTFILE is not the issue. SELECT INTO OUTFILE不是问题。 Numerous other things are slowing the query. 许多其他因素正在减慢查询速度。

Here are the snippets that I need to discuss: 以下是我需要讨论的片段:

    FROM  inbound_022018 a
    LEFT JOIN  wireless_checks b  ON (a.uuid = b.uuid)
    WHERE  date(a.start_epoch)='2018-02-19'
      AND  a.endpoint_disposition='ANSWER'
      AND  a.direction='inbound'
      AND  a.billed=1;

    `uuid` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,

    `uuid` varchar(100) NOT NULL ... DEFAULT CHARSET=latin1

    float(11,4)

    `date` varchar(10) NOT NULL, ...
    `ported_date` varchar(11) NOT NULL,

    PRIMARY KEY (`id`),
    UNIQUE KEY `index_inbound_0717` (`id`) USING BTREE,

    PRIMARY KEY (`id`), ...
    UNIQUE KEY `index_all` (`id`,`campaign_id`,`session_id`,`uuid`) USING BTREE,

Many problems: 很多问题:

  • UUIDs are notorously "random". UUID众所周知是“随机的”。 How big are your tables? 您的桌子有多大? If they are bigger than be cached in RAM, the query is destined to be miserably slow. 如果它们大于缓存在RAM中的内存,则该查询注定会非常缓慢。
  • When comparing two strings ( a.uuid = b.uuid ), indexes cannot be used if the charset or collation is different. 比较两个字符串( a.uuid = b.uuid )时,如果字符集或排序规则不同,则不能使用索引。 Fix that. 解决这个。
  • Even smaller would be to convert from strings to BINARY(16) . 从字符串转换为BINARY(16)甚至更小。 (Code available elsewhere.) (代码可在其他地方获得。)
  • UUIDs, unless you have something special, can be `CHAR(26) CHARSET ascii. 除非有特殊之处,否则UUID可以是`CHAR(26)CHARSET ascii。 This cleans up several things. 这清理了几件事。
  • Table a needs a composite INDEX(billed, direction, endpoint_disposition, start_epoch) to make the WHERE more efficient. a需要复合INDEX(billed, direction, endpoint_disposition, start_epoch) ,以提高WHERE效率。 The first 3 columns may be in any order. 前3列可以按任何顺序排列。
  • Change the date test as noted below. 如下所述更改日期测试。
  • A PRIMARY KEY is a UNIQUE key; PRIMARY KEYUNIQUE键; remove the latter. 删除后者。
  • FLOAT(m,n) is a useless construct because it involves two roundings. FLOAT(m,n)是无用的构造,因为它涉及两个舍入。 For monetary values, use DECIMAL(m,n) ; 对于货币值,请使用DECIMAL(m,n) for 'scientific' values, use FLOAT without the (m,n) . 对于“科学”值,请使用不带(m,n) FLOAT
  • It is almost never valid to have a secondary key start with all of the columns of the PK. 从PK的所有列开始都有辅助键几乎是无效的。 (OK, MyISAM may benefit, but InnoDB rarely does.) (好的,MyISAM可能会受益,但是InnoDB很少受益。)
  • If you don't need b.id for anything anywhere, get rid of it and promote uuid to be the PK. 如果您在任何地方都不需要b.id ,请摆脱它并把uuid升级为PK。 This will speed up the JOIN for InnoDB. 这将加速InnoDB的JOIN
  • Unless you have some good reason, don't put dates in VARCHAR . 除非您有充分的理由,否则不要将日期放在VARCHAR
  • Don't use MyISAM; 不要使用MyISAM; fix the issues I have discussed here. 解决我在这里讨论的问题。 Then come back for further discussion if needed. 然后,如有需要,请返回进行进一步讨论。

When a column is 'hidden' inside a function (eg, DATE() ), indexing the column fails to help. 当某个列在函数(例如DATE() )中被“隐藏”时,对该列进行索引将无济于事。 Change to 改成

WHERE  a.start_epoch >= '2018-02-19'
  AND  a.start_epoch  < '2018-02-19' + INTERVAL 1 DAY

With that change, the 4th column in my suggested INDEX will be usable. 进行此更改后,我建议的INDEX的第4列将可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM