简体   繁体   English

在MySql中通过限制来优化订单

[英]Optimizing order by with a limit in MySql

I have a 3 million records table called "transactions" . 我有一个300万个称为“交易”的记录表。

CREATE TABLE transactions(
  id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  lookupAId int(6) NOT NULL,
  .....
  updateDate TIMESTAMP
)

In the worst case the user will specify no filters and the query would looks like this : 在最坏的情况下,用户将不指定过滤器,查询将如下所示:

select * from transactions
   join lookupA on (well indexed columns) 
   .. ( 12 lookup table joins) 
order by updateDate limit 500

Without the order by clause the query runs in milliseconds, but with the order by it takes about a minute. 如果没有BY子句查询以毫秒为单位运行,但该命令所花费约一分钟的顺序 The table is projected to grow to 12-15 million records. 该表预计将增长到12-15百万条记录。

  1. My SLA is to get results in under a second, is it possible in MySql ? 我的SLA是在一秒钟内获得结果的,这在MySql中是可能的吗?
  2. How can I optimize the order by clause to make this perform. 我如何优化order by子句以使其执行。

I run MySql 5.7 in xLarge memory optimized RDS instance in AWS 我在AWS的xLarge内存优化RDS实例中运行MySql 5.7

UPDATE 1 updateDate has a time component and is indexed (B-tree, non-unique) UPDATE 1 updateDate具有时间分量并被索引(B树,非唯一)

Update 2 This worked , although I don't know why 更新2这行得通,尽管我不知道为什么

SELECT * FROM (select * from transactions order by updateDate) transactions
   join lookupA on (well indexed columns) 
   .. ( 12 lookup table joins) 
   limit 500

如果您还没有的话,那么ORDER BY肯定会从索引中受益:

create index ix1 on transactions (updateDate);

MySQL is probably doing a lot of work on the query before limiting the query size with limit. MySQL可能在对查询大小进行限制之前,对查询做了很多工作。 This seems to be a known weakness of MySQL. 这似乎是MySQL的已知弱点。

Try doing the select from transactions in a subquery to limit the result set size before doing the joins. 在执行联接之前,尝试从子查询的事务中进行选择以限制结果集的大小。

SELECT * FROM (select * from transactions order by updateDate limit 500) transactions
   join lookupA on (well indexed columns) 
   .. ( 12 lookup table joins) 

The usual technique for tackling this problem: 解决此问题的常用技术:

SELECT ... JOIN ...
    LIMIT ...

is to: 是为了:

  1. Do the minimal amount of work to find the PRIMARY KEY values of the rows that factor into the LIMIT rows. 进行最少的工作,以找出会影响LIMIT行的行的PRIMARY KEY值。
  2. Feed those ids into the JOINs to get the rest of the info. 将这些ID馈入JOINs以获取其余信息。

As your query stands, the Optimizer throws up its hands and simply does all the JOIN (optimizing each as best it can), generating a large (many rows, many columns) intermediate table, then apply the ORDER BY (sort the many rows of many columns) and LIMIT (deliver some of those rows). 如您的查询所示,Optimizer举起手,简单地完成所有JOIN (尽其所能地优化每个JOIN ),生成一个大的(很多行,很多列)中间表,然后应用ORDER BY (对很多行进行排序)许多列)和LIMIT (提供其中一些行)。

With INDEX(OrderDate) (and that column is in the table it chooses to start the JOINing with) the Optimizer can at least consider using the index. 使用INDEX(OrderDate) (并且该表在表中选择开始加入的JOINing )中,优化器至少可以考虑使用索引。 But that might be the worst case -- What if there are not 500 rows to be had; 但这可能是最坏的情况-如果没有500行怎么办? it will have done all the work anyway! 反正它将完成所有工作!

The Optimizer does not know that a table is a simple "lookup" table. 优化器不知道表是简单的“查找”表。 It must be prepared to find 0 rows or more than 1 row. 必须准备查找0行或多于1行。

Case 1: You know that there is exactly 1 row in each of the lookup ( JOINed ) tables: 情况1:您知道每个查询( JOINed )表中恰好有1行:

Case 2: You know that there is at most 1 row in each lookup table. 情况2:您知道每个查询表中最多有1行。

In both of these cases, the following is an efficient way to rewrite the query: 在这两种情况下,以下是重写查询的有效方法:

SELECT  t.a, t.b, ...
        ( SELECT name FROM LU1 WHERE id = t.name_id ) AS name, 
        ( SELECT foo  FROM LU1 WHERE id = t.foo_id ) AS foo, 
        ...
    FROM transactions AS t
    ORDER BY t.OrderDate
    LIMIT ...

and

INDEX(OrderDate)
INDEX(id)  -- for each LU table, unless there is already `PRIMARY KEY(id)`

This formulation of the query will focus on walking through exactly 500 rows, presorted by OrderDate , looking up 12 things for each row. 查询的这种表述方式将重点在于准确地遍历500行(按OrderDate预先OrderDate ,为每行查找12个内容。

It is semantically equivalent to Case 2 ( LEFT JOIN ) since it gives NULL for name (etc) when there is no mapping. 从语义上讲,它与情况2( LEFT JOIN )是等效的,因为当没有映射时,它为name (等)提供NULL

Technically, Case 1 is not the same. 从技术上讲,情况1是不同的。 If a lookup fails, JOIN will fail to count the row, but my reformulation will keep the row, showing NULL . 如果查找失败,则JOIN将无法计算该行,但是我的重新编写将保留该行,并显示NULL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM