简体   繁体   English

如何优化查询的执行计划,多个外连接到大表,分组和顺序子句?

[英]How to optimize execution plan for query with multiple outer joins to huge tables, group by and order by clauses?

I have the following database (simplified): 我有以下数据库(简化):

CREATE TABLE `tracking` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `manufacture` varchar(100) NOT NULL,
  `date_last_activity` datetime NOT NULL,
  `date_created` datetime NOT NULL,
  `date_updated` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `manufacture` (`manufacture`),
  KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`),
  KEY `date_last_activity` (`date_last_activity`),
) ENGINE=InnoDB AUTO_INCREMENT=401353 DEFAULT CHARSET=utf8

CREATE TABLE `tracking_items` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `tracking_id` int(11) NOT NULL,
  `tracking_object_id` varchar(100) NOT NULL,
  `tracking_type` int(11) NOT NULL COMMENT 'Its used to specify the type of each item, e.g. car, bike, etc',
  `date_created` datetime NOT NULL,
  `date_updated` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `tracking_id` (`tracking_id`),
  KEY `tracking_object_id` (`tracking_object_id`),
  KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1299995 DEFAULT CHARSET=utf8

CREATE TABLE `cars` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `car_id` varchar(255) NOT NULL COMMENT 'It must be VARCHAR, because the data is coming from external source.',
  `manufacture` varchar(255) NOT NULL,
  `car_text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `date_order` datetime NOT NULL,
  `date_created` datetime NOT NULL,
  `date_updated` datetime NOT NULL,
  `deleted` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `car_id` (`car_id`),
  KEY `sort_field` (`date_order`)
) ENGINE=InnoDB AUTO_INCREMENT=150000025 DEFAULT CHARSET=utf8

This is my "problematic" query, that runs extremely slow. 这是我的“有问题”查询,运行速度非常慢。

SELECT sql_no_cache `t`.*,
       count(`t`.`id`) AS `cnt_filtered_items`
FROM `tracking` AS `t`
INNER JOIN `tracking_items` AS `ti` ON (`ti`.`tracking_id` = `t`.`id`)
LEFT JOIN `cars` AS `c` ON (`c`.`car_id` = `ti`.`tracking_object_id`
                            AND `ti`.`tracking_type` = 1)
LEFT JOIN `bikes` AS `b` ON (`b`.`bike_id` = `ti`.`tracking_object_id`
                            AND `ti`.`tracking_type` = 2)
LEFT JOIN `trucks` AS `tr` ON (`tr`.`truck_id` = `ti`.`tracking_object_id`
                            AND `ti`.`tracking_type` = 3)
WHERE (`t`.`manufacture` IN('1256703406078',
                            '9600048390403',
                            '1533405067830'))
  AND (`c`.`car_text` LIKE '%europe%'
       OR `b`.`bike_text` LIKE '%europe%'
       OR `tr`.`truck_text` LIKE '%europe%')
GROUP BY `t`.`id`
ORDER BY `t`.`date_last_activity` ASC,
         `t`.`id` ASC
LIMIT 15

This is the result of EXPLAIN for above query: 这是以上查询的EXPLAIN的结果:

+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| id | select_type | table |  type  |                             possible_keys                             |     key     | key_len |             ref             |  rows   |                    extra                     |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | t     | index  | PRIMARY,manufacture,manufacture_date_last_activity,date_last_activity | PRIMARY     |       4 | NULL                        | 400,000 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | ti    | ref    | tracking_id,tracking_object_id,tracking_id_tracking_object_id         | tracking_id |       4 | table.t.id                  |       1 | NULL                                         |
|  1 | SIMPLE      | c     | eq_ref | car_id                                                                | car_id      |     767 | table.ti.tracking_object_id |       1 | Using where                                  |
|  1 | SIMPLE      | b     | eq_ref | bike_id                                                               | bike_id     |     767 | table.ti.tracking_object_id |       1 | Using where                                  |
|  1 | SIMPLE      | t     | eq_ref | truck_id                                                              | truck_id    |     767 | table.ti.tracking_object_id |       1 | Using where                                  |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+

What is the problem this query is trying to solve? 这个查询试图解决的问题是什么?

Basically, I need to find all records in tracking table that may be associated with records in tracking_items (1:n) where each record in tracking_items may be associated with record in left joined tables. 基本上,我需要找到tracking表中可能与tracking_items (1:n)中的记录相关联的所有记录,其中tracking_items中的每个记录可能与左连接表中的记录相关联。 The filtering criteria is crucial part in the query. 过滤标准是查询中的关键部分。

What is the problem I have with the query above? 我上面的查询有什么问题?

When there's order by and group by clauses the query runs extremely slow, eg 10-15 seconds to complete for the above configuration. 当有order bygroup by子句时,查询运行速度非常慢,例如10-15秒即可完成上述配置。 However, if I omit any of these clauses, the query is running pretty quick (~0.2 seconds). 但是,如果我省略这些子句中的任何一个,查询运行得非常快(~0.2秒)。

What I've already tried? 我已经尝试过了什么?

  1. I've tried to used a FULLTEXT index, but it didn't help much, as the results evaluated by the LIKE statemenet are narrowed by the JOINs using indexes. 我试图使用FULLTEXT索引,但它没有多大帮助,因为LIKE评估的结果被JOINs使用索引缩小。
  2. I've tried to use WHERE EXISTS (...) to find if there are records in left joined tables, but unfortunately without any luck. 我试图使用WHERE EXISTS (...)来查找left连接表中是否有记录,但遗憾的是没有运气。

Few notes about relations between these tables: 关于这些表之间关系的几点注释:

tracking -> tracking_items (1:n)
tracking_items -> cars (1:1)
tracking_items -> bikes (1:1)
tracking_items -> trucks (1:1)

So, I'm looking for a way to optimize that query. 所以,我正在寻找一种优化该查询的方法。

Bill Karwin suggests the query might perform better if it used an index with a leading column of manufacture . Bill Karwin建议如果查询使用带有前导manufacture列的索引,则查询可能会表现得更好。 I second that suggestion. 我是第二个建议。 Especially if that's very selective. 特别是如果那是非常有选择性的。

I also note that we're doing a GROUP BY t.id , where id is the PRIMARY KEY of the table. 我还注意到我们正在做一个GROUP BY t.id ,其中id是表的PRIMARY KEY。

No columns from any tables other than tracking are referenced in the SELECT list. SELECT列表中不会引用除tracking之外的任何表中的列。

This suggests we're really only interested in returning rows from t , and not on creating duplicates due to multiple outer joins. 这表明我们真的只对从t返回行感兴趣,而不是由于多个外连接而创建重复行。

Seems like the COUNT() aggregate has the potential to return an inflated count, if there are multiple matching rows in tracking_item and bikes , cars , trucks . 如果在tracking_itembikescarstrucks有多个匹配行,则COUNT()聚合似乎有可能返回膨胀计数。 If there's three matching rows from cars, and four matching rows from bikes, ... the COUNT() aggregate is going to return a value of 12, rather than 7. (Or maybe there is some guarantee in the data such that there won't ever be multiple matching rows.) 如果来自汽车的三个匹配行和来自自行车的四个匹配行,则... COUNT()聚合将返回值12而不是7.(或者可能在数据中有一些保证以便赢得永远不会有多个匹配的行。)

If the manufacture is very selective, and that returns a reasonably small set of rows from tracking , if the query can make use of an index ... 如果manufacture是非常有选择性的,并且从tracking返回一组相当小的行,如果查询可以使用索引...

And since we aren't returning any columns from any tables other than tracking , apart from a count or related items ... 而且,除了计数或相关项目之外,我们不会从tracking以外的任何表格返回任何列...

I would be tempted to test correlated subqueries in the SELECT list, to get the count, and filter out the zero count rows using a HAVING clause. 我很想测试SELECT列表中的相关子查询,获取计数,并使用HAVING子句过滤掉零计数行。

Something like this: 像这样的东西:

SELECT SQL_NO_CACHE `t`.*
     , ( ( SELECT COUNT(1)
             FROM `tracking_items` `tic`
             JOIN `cars` `c`
               ON `c`.`car_id`           = `tic`.`tracking_object_id`
              AND `c`.`car_text`      LIKE '%europe%'
            WHERE `tic`.`tracking_id`    = `t`.`id`
              AND `tic`.`tracking_type`  = 1
         )
       + ( SELECT COUNT(1)
             FROM `tracking_items` `tib`
             JOIN `bikes` `b`
               ON `b`.`bike_id`          = `tib`.`tracking_object_id` 
              AND `b`.`bike_text`     LIKE '%europe%'
            WHERE `tib`.`tracking_id`    = `t`.`id`
              AND `tib`.`tracking_type`  = 2
         )
       + ( SELECT COUNT(1)
             FROM `tracking_items` `tit`
             JOIN `trucks` `tr`
               ON `tr`.`truck_id`        = `tit`.`tracking_object_id`
              AND `tr`.`truck_text`   LIKE '%europe%'
            WHERE `tit`.`tracking_id`    = `t`.`id`
              AND `tit`.`tracking_type`  = 3
         ) 
       ) AS cnt_filtered_items
  FROM `tracking` `t`
 WHERE `t`.`manufacture` IN ('1256703406078', '9600048390403', '1533405067830')
HAVING cnt_filtered_items > 0
 ORDER
    BY `t`.`date_last_activity` ASC
     , `t`.`id` ASC

We'd expect that the query could make effective use of an index on tracking with leading column of manufacture . 我们期望查询可以有效地使用tracking与领先的manufacture列进行tracking

And on the tracking_items table, we want an index with leading columns of type and tracking_id . tracking_items表中,我们需要一个带有typetracking_id前导列的索引。 And including tracking_object_id in that index would mean the query could be satisfied from the index, without visiting the underlying pages. 并且在该索引中包括tracking_object_id将意味着可以从索引满足查询,而无需访问底层页面。

For the cars , bikes and trucks tables the query should make use of an index with leading column of car_id , bike_id , and truck_id respectively. 对于carsbikestrucks表,查询应该分别使用带有car_idbike_idtruck_id前导列的索引。 There's no getting around a scan of the car_text , bike_text , truck_text columns for the matching string... best we can do is narrow down the number rows that need to have that check performed. 对于匹配的字符串,没有扫描car_textbike_texttruck_text列...我们能做的最好是缩小需要进行检查的行数。

This approach (just the tracking table in the outer query) should eliminate the need for the GROUP BY , the work required to identify and collapse duplicate rows. 这种方法(只是外部查询中的tracking表)应该不需要GROUP BY ,这是识别和折叠重复行所需的工作。

BUT this approach, replacing joins with correlated subqueries, is best suited to queries where there is a SMALL number of rows returned by the outer query. 这种做法,取代以相关子查询连接,最适合查询,那里是由外部查询返回的行一个数目。 Those subqueries get executed for every row processed by the outer query. 对外部查询处理的每一行执行这些子查询。 It's imperative that those subqueries to have suitable indexes available. 这些子查询必须具有合适的索引。 Even with those tuned, there is still potential for horrible performance for large sets. 即使有这些调整,大型集仍然有可能出现糟糕的表现。

This does still leave us with a "Using filesort" operation for the ORDER BY . 这仍然为我们留下了ORDER BY的“Using filesort”操作。


If the count of related items should be the product of a multiplication, rather than addition, we could tweak the query to achieve that. 如果相关项的计数应该是乘法而不是加法的乘积,我们可以调整查询来实现这一点。 (We'd have to muck with the return of zeros, and the condition in the HAVING clause would need to be changed.) (我们必须清除零的返回,并且需要更改HAVING子句中的条件。)

If there wasn't a requirement to return a COUNT() of related items, then I would be tempted to move the correlated subqueries from the SELECT list down into EXISTS predicates in the WHERE clause. 如果没有要求返回相关项的COUNT(),那么我很想将相关子查询从SELECT列表向下移动到WHERE子句中的EXISTS谓词中。


Additional notes: seconding the comments from Rick James regarding indexing... there appears to be redundant indexes defined. 附加说明:借调Rick James关于索引的评论......似乎定义了冗余索引。 ie

KEY `manufacture` (`manufacture`)
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`)

The index on the singleton column isn't necessary, since there is another index that has the column as the leading column. 单例列上的索引不是必需的,因为还有另一个索引将列作为前导列。

Any query that can make effective use of the manufacture index will be able to make effective use of the manufacture_date_last_activity index. 任何可以有效使用manufacture索引的查询都能够有效地使用manufacture_date_last_activity索引。 That is to say, the manufacture index could be dropped. 也就是说, manufacture指数可能会下降。

The same applies for the tracking_items table, and these two indexes: 这同样适用于tracking_items表,以及这两个索引:

KEY `tracking_id` (`tracking_id`)
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)

The tracking_id index could be dropped, since it's redundant. tracking_id索引可能会被删除,因为它是多余的。

For the query above, I would suggest adding a covering index: 对于上面的查询,我建议添加覆盖索引:

KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`,`tracking_object_id`)

-or- at a minimum, a non-covering index with those two columns leading: - 或 - 至少是一个非覆盖索引,这两个列导致:

KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`)

The EXPLAIN shows you are doing an index-scan ("index" in the type column) on the tracking table. EXPLAIN显示您正在跟踪表上进行索引扫描( type列中的“索引”)。 An index-scan is pretty much as costly as a table-scan, especially when the index scanned is the PRIMARY index. 索引扫描与表扫描一样昂贵,特别是当扫描的索引是PRIMARY索引时。

The rows column also shows that this index-scan is examining > 355K rows (since this figure is only a rough estimate, it's in fact examining all 400K rows). rows列还显示此索引扫描正在检查> 355K行(因为这个数字只是粗略估计,实际上它正在检查所有400K行)。

Do you have an index on t.manufacture ? 你有关于t.manufacture的索引吗? I see two indexes named in the possible keys that might include that column (I can't be sure solely based on the name of the index), but for some reason the optimizer isn't using them. 我看到可能包含该列的possible keys中命名的两个索引(我不能单独根据索引的名称确定),但由于某种原因,优化器没有使用它们。 Maybe the set of values you search for is matched by every row in the table anyway. 也许您搜索的值集合无论如何都会与表格中的每一行匹配。

If the list of manufacture values is intended to match a subset of the table, then you might need to give a hint to the optimizer to make it use the best index. 如果manufacture值列表旨在匹配表的子集,那么您可能需要向优化器提供一个提示,以使其使用最佳索引。 https://dev.mysql.com/doc/refman/5.6/en/index-hints.html https://dev.mysql.com/doc/refman/5.6/en/index-hints.html

Using LIKE '%word%' pattern-matching can never utilize an index, and must evaluate the pattern-match on every row. 使用LIKE '%word%'模式匹配永远不会使用索引,并且必须评估每一行上的模式匹配。 See my presentation, Full Text Search Throwdown . 请参阅我的演示文稿, 全文搜索向下搜索

How many items are in your IN(...) list? IN(...)列表中有多少项? MySQL sometimes has problems with very long lists. MySQL有时会出现很长的列表问题。 See https://dev.mysql.com/doc/refman/5.6/en/range-optimization.html#equality-range-optimization 请参阅https://dev.mysql.com/doc/refman/5.6/en/range-optimization.html#equality-range-optimization

PS: When you ask a query optimization question, you should always include the SHOW CREATE TABLE output for each table referenced in the query, so folks who answer don't have to guess at what indexes, data types, constraints you currently have. PS:当您询问查询优化问题时,应始终为查询中引用的每个表包含SHOW CREATE TABLE输出,因此回答的人不必猜测您当前具有的索引,数据类型和约束。

First of all: Your query makes assumptions about string contents, which it shouldn't. 首先:您的查询会对字符串内容进行假设,但不应该这样做。 What may car_text like '%europe%' indicate? car_text like '%europe%'表示什么? Something like 'Sold in Europe only' maybe? 'Sold in Europe only'可能吗? Or Sold outside Europe only ? Sold outside Europe only Two possible strings with contradictory meanings. 两个可能具有矛盾含义的字符串。 So if you assume a certain meaning once you find europe in the string, then you should be able to introduce this knowledge in the database - with a Europe flag or a region code for instance. 因此,如果您在字符串中找到europe语时假设某种意义,那么您应该能够在数据库中引入这些知识 - 例如欧洲国旗或地区代码。

Anyway, you are showing certain trackings with their Europe transportation count. 无论如何,您正在显示其欧洲运输计数的某些跟踪。 So select trackings, select transportation counts. 因此,选择跟踪,选择运输计数。 You can either have the aggregation subquery for transportation counts in your SELECT clause or in your FROM clause. 您可以在SELECT子句或FROM子句中为传输计数设置聚合子查询。

Subquery in SELECT clause: SELECT子句中的子查询:

select
  t.*,
  (
    select count(*)
    from tracking_items ti
    where ti.tracking_id = t.id
    and (tracking_type, tracking_object_id) in
    (
      select 1, car_id from cars where car_text like '%europe%'
      union all
      select 2, bike_id from bikes where bike_text like '%europe%'
      union all
      select 3, truck_id from trucks where truck_text like '%europe%'
    )
from tracking t
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;

Subquery in FROM clause: FROM子句中的子查询:

select
  t.*, agg.total
from tracking t
left join
(
  select tracking_id, count(*) as total
  from tracking_items ti
  and (tracking_type, tracking_object_id) in
  (
    select 1, car_id from cars where car_text like '%europe%'
    union all
    select 2, bike_id from bikes where bike_text like '%europe%'
    union all
    select 3, truck_id from trucks where truck_text like '%europe%'
  )
  group by tracking_id
) agg on agg.tracking_id = t.id
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;

Indexes: 索引:

  • tracking(manufacture, date_last_activity, id) 跟踪(制造,date_last_activity,id)
  • tracking_items(tracking_id, tracking_type, tracking_object_id) tracking_items(tracking_id,tracking_type,tracking_object_id)
  • cars(car_text, car_id) 汽车(car_text,car_id)
  • bikes(bike_text, bike_id) 自行车(bike_text,bike_id)
  • trucks(truck_text, truck_id) 卡车(truck_text,truck_id)

Sometimes MySQL is stronger on simple joins than on anything else, so it may be worth a try to blindly join transportation records and only later see whether it's car, bike or truck: 有时MySQL在简单连接上比在其他任何东西上更强大,因此可能值得尝试盲目地加入交通记录,然后才能看到它是汽车,自行车还是卡车:

select
  t.*, agg.total
from tracking t
left join
(
  select
    tracking_id,
    sum((ti.tracking_type = 1 and c.car_text like '%europe%')
        or
        (ti.tracking_type = 2 and b.bike_text like '%europe%')
        or
        (ti.tracking_type = 3 and t.truck_text like '%europe%')
       ) as total
  from tracking_items ti
  left join cars c on c.car_id = ti.tracking_object_id
  left join bikes b on c.bike_id = ti.tracking_object_id
  left join trucks t on t.truck_id = ti.tracking_object_id
  group by tracking_id
) agg on agg.tracking_id = t.id
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;

If my guess is correct and cars , bikes , and trucks are independent from each other (ie a particular pre-aggregate result would only have data from one of them). 如果我的猜测是正确的,并且carsbikestrucks彼此独立(即特定的预聚合结果将只有来自其中一个的数据)。 You might be better off UNIONing three simpler sub-queries (one for each). UNIONing三个更简单的子查询(每个子查询一个)可能会更好。

While you cannot do much index-wise about LIKEs involving leading wildcards; 虽然你不能在涉及领先通配符的LIKE方面做很多指数; splitting it into UNIONed queries could allow avoid evaluating p.fb_message LIKE '%Europe%' OR p.fb_from_name LIKE '%Europe% for all the cars and bikes matches, and the c conditions for all the b and t matches, and so on. 将其拆分为UNIONed查询可以避免评估所有carsbikes匹配的p.fb_message LIKE '%Europe%' OR p.fb_from_name LIKE '%Europe% ,以及所有bt匹配的c条件,等等。

ALTER TABLE cars ADD FULLTEXT(car_text)

then try 然后试试

select  sql_no_cache
        `t`.*,  -- If you are not using all, spell out the list
        count(`t`.`id`) as `cnt_filtered_items`  -- This does not make sense
                         -- and is possibly delivering an inflated value
    from  `tracking` as `t`
    inner join  `tracking_items` as `ti`  ON (`ti`.`tracking_id` = `t`.`id`)
    join   -- not LEFT JOIN
         `cars` as `c`  ON `c`.`car_id` = `ti`.`tracking_object_id`
                                     AND  `ti`.`tracking_type` = 1 
    where  `t`.`manufacture` in('1256703406078', '9600048390403', '1533405067830')
      AND  MATCH(c.car_text)  AGAINST('+europe' IN BOOLEAN MODE)
    group by  `t`.`id`    -- I don't know if this is necessary
    order by  `t`.`date_last_activity` asc, `t`.`id` asc
    limit  15;

to see if it will correctly give you a suitable 15 cars . 看它是否会正确地给你一个合适的15 辆车

If that looks OK, then combine the three together: 如果看起来没问题,那么将三者合并在一起:

SELECT  sql_no_cache
        t2.*,
        -- COUNT(*)  -- this is probably broken
    FROM (
        ( SELECT t.id FROM ... cars ... )  -- the query above
        UNION ALL     -- unless you need UNION DISTINCT
        ( SELECT t.id FROM ... bikes ... )
        UNION ALL
        ( SELECT t.id FROM ... trucks ... )
         ) AS u
    JOIN tracking AS t2  ON t2.id = u.id
    ORDER BY t2.date_last_activity, t2.id
    LIMIT 15;

Note that the inner SELECTs only deliver t.id , not t.* . 请注意,内部SELECTs仅传递t.id ,而不是t.*

Anoter index needed: 需要Anoter指数:

ti:  (tracking_type, tracking_object_id)   -- in either order

Indexes 索引

When you have INDEX(a,b) , you don't also need INDEX(a) . 当你有INDEX(a,b) ,你也不需要INDEX(a) (This won't help the query in question, but it will help disk space and INSERT performance.) (这对查询无效,但它有助于磁盘空间和INSERT性能。)

When I see PRIMARY KEY(id), UNIQUE(x) , I look for any good reason not to get rid of id and change to PRIMARY KEY(x) . 当我看到PRIMARY KEY(id), UNIQUE(x) ,我寻找任何有理由不去除id并更改为PRIMARY KEY(x) Unless there is something significant in the 'simplification' of the schema, such a change would help. 除非在模式的“简化”中有重要的东西,否则这样的改变会有所帮助。 Yeah, car_id is bulky, etc, but it is a big table and the extra lookup (from index BTree to data BTree) is hurting, etc. 是的, car_id是笨重的,等等,但它是一个大表,额外的查找(从索引BTree到数据BTree)受到伤害,等等。

I think it is very unlikely that KEY sort_field (date_order) will ever be used. 我认为KEY sort_field (date_order)不太可能被使用。 Either drop it (saving a few GB) or combine it in some useful way. 丢弃它(节省几GB)或以一些有用的方式组合它。 Let's see the query in which you think it might be useful. 让我们看看您认为它可能有用的查询。 (Again, a suggestion that is not directly relevant to this Question.) (同样,这个建议与本课题没有直接关系。)

re Comment(s) 评论

I made some substantive changes to my formulation. 我对我的表述做了一些实质性的修改。

My formulation has 4 GROUP BYs , 3 in the 'derived' table (ie, FROM ( ... UNION ... ) ), and one outside. 我的公式有4个GROUP BYs ,3个在'derived'表中(即FROM ( ... UNION ... ) ),还有一个在外面。 Since the outer part is limited to 3*15 rows, I do not worry about performance there. 由于外部部分限制为3 * 15行,我不担心那里的性能。

Further note that the derived table delivers only t.id , then re-probes tracking to get the other columns. 进一步请注意,派生表仅提供t.id ,然后重新探测tracking以获取其他列。 This lets the derived table run much faster, but at a small expense of the extra JOIN outside. 这使得派生表的运行速度更快,但外部JOIN费用很少。

Please elaborate on the intent of the COUNT(t.id) ; 请详细说明COUNT(t.id)的意图COUNT(t.id) ; it won't work in my formulation, and I don't know what it is counting. 它在我的表述中不起作用,我不知道它在计算什么。

I had to get rid of the ORs ; 我不得不摆脱ORs ; they are the secondary performance killer. 他们是次要的表演杀手。 (The first killer is LIKE '%...' .) (第一个杀手是LIKE '%...' 。)

When there's order by and group by clauses the query runs extremely slow, eg 10-15 seconds to complete for the above configuration. 当有order bygroup by子句时,查询运行速度非常慢,例如10-15秒即可完成上述配置。 However, if I omit any of these clauses, the query is running pretty quick (~0.2 seconds). 但是,如果我省略这些子句中的任何一个,查询运行得非常快(~0.2秒)。

This is interesting... generally the best optimization technique I know is to make good use of temporary tables, and it sounds like it will work really well here. 这很有趣......通常我认识的最好的优化技术是充分利用临时表,听起来它在这里工作得非常好。 So you would first create the temporary table: 所以你首先要创建临时表:

create temporary table tracking_ungrouped (
    key (id)
)
select sql_no_cache `t`.*
from `tracking` as `t` 
inner join `tracking_items` as `ti` on (`ti`.`tracking_id` = `t`.`id`)
    left join `cars` as `c` on (`c`.`car_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 1)
    left join `bikes` as `b` on (`b`.`bike_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 2)    
    left join `trucks` as `tr` on (`tr`.`truck_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 3)
where 
    (`t`.`manufacture` in('1256703406078', '9600048390403', '1533405067830')) and 
    (`c`.`car_text` like '%europe%' or `b`.`bike_text` like '%europe%' or `tr`.`truck_text` like '%europe%');

and then query it for the results you need: 然后查询它以获得所需的结果:

select t.*, count(`t`.`id`) as `cnt_filtered_items`
from tracking_ungrouped t
group by `t`.`id` 
order by `t`.`date_last_activity` asc, `t`.`id` asc 
limit 15;
SELECT t.*
FROM (SELECT * FROM tracking WHERE manufacture 
                IN('1256703406078','9600048390403','1533405067830')) t
INNER JOIN (SELECT tracking_id, tracking_object_id, tracking_type FROM tracking_items
    WHERE tracking_type IN (1,2,3)) ti 
    ON (ti.tracking_id = t.id)
LEFT JOIN (SELECT car_id, FROM cars WHERE car_text LIKE '%europe%') c 
ON (c.car_id = ti.tracking_object_id AND ti.tracking_type = 1)
    LEFT JOIN (SELECT bike_id FROM bikes WHERE bike_text LIKE '%europe%') b 
ON (b.bike_id = ti.tracking_object_id AND ti.tracking_type = 2)
    LEFT JOIN (SELECT truck_id FROM trucks WHERE truck_text LIKE '%europe%') tr 
ON (tr.truck_id = ti.tracking_object_id AND ti.tracking_type = 3)
    ORDER BY t.date_last_activity ASC, t.id ASC

The subqueries perform faster when it comes to join and if they are going to filter out lot of records. 子查询在加入时执行速度更快,如果他们要过滤掉大量记录。

The subquery of tracking table will filter out lot of other unwanted manufacture and results in a smaller table t to be joined. 跟踪表的子查询将过滤掉许多其他不需要的制造,并导致要连接的较小的表t

Similarly applied the condition for the tracking_items table as we are interested in only tracking_types 1,2 and 3 ; 类似地应用了tracking_items表的条件,因为我们只对tracking_types 1,2和3感兴趣; to create a smaller table ti . 创建一个较小的表ti If there are a lot of tracking_objects, you can even add the tracking object filter in this subquery. 如果有很多tracking_objects,您甚至可以在此子查询中添加跟踪对象过滤器。

Similar approaches to tables cars, bikes, trucks with their condition for their respective text to contain europe helps us to create smaller tables c,b,tr respectively. 表格汽车,自行车,卡车及其各自文本包含欧洲的条件的类似方法有助于我们分别创建较小的表c,b,tr

Also removing the group by t.id as t.id is unique and we are performing inner join and left join on that or resulting table, as there is no need. 同时将t.id作为t.id删除组是唯一的,我们在该表或结果表上执行内连接和左连接,因为没有必要。

Lastly I am only selecting the required columns from each of the tables that are necessary, which will also reduce the load on the memory space and also runtime. 最后,我只是从每个必要的表中选择所需的列 ,这也将减少内存空间和运行时的负载。

Hope this helps. 希望这可以帮助。 Please let me know your feedback and run statistics. 请让我知道您的反馈并运行统计信息。

我不确定它会起作用,如何在ON子句中对每个表(汽车,自行车和卡车)应用过滤器,在加入之前,它应该过滤掉行?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM