简体   繁体   English

SQL:从词法排序表中仅选择第一行

[英]SQL: Select only 1st row from lexical ordered table

In a nutshell 简而言之

How can you speed up this statement (running on a table with very much rows)?: 如何加快该语句的速度(在具有很多行的表上运行)?

select * from mytable where val2=4 order by key1, key2, key3 limit 1;

In detail 详细

This is my table (here displayed sorted lexically by its three keyfields) from which I want to select the one row that I have marked with an arrow. 这是我的表(此处按其三个关键字段按词法排序显示),我要从中选择用箭头标记的一行。 There are 3 fields in the primary index: key1, then key2, then key3. 主索引中有3个字段:key1,key2,key3。

Know that my real table has more columns and about 100,000 rows (and an index on column val2). 知道我的真实表具有更多列和大约100,000行(以及val2列上的索引)。

key1 | key2 | key3 | val1 | val2
-----+------+------+------+------
   2 |    1 |    0 |    1 |    1 
   3 |    1 |    0 |    2 |    2 
   3 |    2 |    0 |    3 |    3 
   3 |    2 |    1 |    1 |    4  <==
   4 |    1 |    0 |    2 |    5 
   4 |    2 |    0 |    3 |    1 
   4 |    2 |    1 |    1 |    2 
   4 |    3 |    0 |    2 |    3 
   4 |    3 |    1 |    3 |    4 
   4 |    3 |    2 |    1 |    5 
   5 |    1 |    0 |    2 |    1 
   5 |    2 |    0 |    3 |    2 
   5 |    2 |    1 |    1 |    3 
   5 |    3 |    0 |    2 |    4 
   5 |    3 |    1 |    3 |    5 
   5 |    3 |    2 |    1 |    1 
   5 |    4 |    0 |    2 |    2 
   5 |    4 |    1 |    3 |    3 
   5 |    4 |    2 |    1 |    4 
   5 |    4 |    3 |    2 |    5 

This is the statement that exactly delivers the wanted row, and also explains what I want in detail: 这是准确传达所需行的语句,还详细说明了我想要的内容:

select * from mytable where val2=4 order by key1, key2, key3 limit 1;

I want to do this (in sequential pseudocode): 我想这样做(以顺序伪代码):

1. Select all rows which have the value 4 in field val2.
2. Sort those rows by key1, then by key2, then by key3
3. Return only the first single row of this sorted set of rows

My select statement needs to read the whole table, and then has to sort a huge amount of rows before it can find the one row that I want. 我的select语句需要读取整个表,然后必须对大量行进行排序,才能找到所需的一行。

I think this could be done quicker with nested subselects (i know this syntax is wrong, but I hope you understand what i want to do): 我认为可以使用嵌套的子选择更快地完成此操作(我知道这种语法是错误的,但我希望您理解我想做什么):

select * from mytable where key1+key2+key3 = (
    select key1, key2, min(key3) from mytable where val2=4 and key1+key2 = (
        select key1, min(key2) from mytable where val2=4 and key1 = (
            select min(key1) from mytable where val2=4
        )
    )
)

But I don't know how to write this in a correct sql syntax, and I'm not sure if this really is a better way. 但是我不知道如何用正确的sql语法编写此代码,而且我不确定这是否是更好的方法。 I think, there must be an elegant solution using joins (joining a table with itself), but I can't find such an solution. 我认为,必须有一个使用联接(将表与自身联接)的优雅解决方案,但我找不到这种解决方案。

Can you help, please? 你能帮忙吗?


EDIT (after comments) 编辑(评论后)

Ok, let's talk about my real table: 好吧,让我们谈谈我的真实桌子:

At the moment, there is only one row in this table, and it has not 3 but 2 key-fields. 目前,该表中只有一行,它没有3个键字段,而是2个键字段。 But this table will grow in an iterative process, where one row has to be selected using the statement we are discussing about now. 但是此表将以迭代的方式增长,其中必须使用我们现在讨论的语句选择一行。 This row will be processed, and as a result of this process, this row will be updated. 此行将被处理,作为此过程的结果,该行将被更新。 Plus: Between 0 and 2 new rows will be inserted. 加号:将在0到2之间插入新行。 Then it repeats: A new row will be selected, analyzed and updated, and again between 0 and 2 new rows will be inserted. 然后重复:选择,分析和更新新行,然后再次插入0到2之间的新行。

At the beginning this process will add lots of new rows, that need to be read later. 在开始时,此过程将添加许多新行,以后需要阅读。 At the end hopefully this process stops, because there are no more rows that match to the WHERE-clause. 最后,希望该过程停止,因为没有更多行与WHERE子句匹配。 Then the remaining rows have to be analyzed. 然后,必须对其余行进行分析。

So, this are the statements that create the table and insert the starting-row: 因此,这是创建表并插入起始行的语句:

CREATE TABLE `numbers` (
  `a0` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `b0` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `n` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `an` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `bn` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `m` double NOT NULL DEFAULT '0',
  `gele` char(1) NOT NULL DEFAULT '?'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `numbers` (`a0`, `b0`, `n`, `an`, `bn`, `m`, `gele`) VALUES
(1, 0, 0, 0, 0, 0, '?');

ALTER TABLE `numbers`
  ADD PRIMARY KEY (`a0`,`b0`),
  ADD KEY `gele` (`gele`);

Here is my statement: 这是我的声明:

SELECT `a0`, `b0`, `n`, `an`, `bn`, `m`, `gele`
FROM `numbers`
WHERE `gele` = '?' OR `gele` = '='
ORDER BY `a0`, `b0`
LIMIT 1;

And this is the result of EXPLAIN SELECT .... : 这是EXPLAIN SELECT ....的结果:

id | select_type | table   | partitions | type   | possible_keys | key     | key_len | ref  | rows | filtered | Extra  
 1 | SIMPLE      | numbers | NULL       | index  | gele          | PRIMARY |       8 | NULL | 1    | 100.00   | Using where

Since there is only 1 row in the table at the moment, the result of the explain statement is not very helpful, sorry. 由于当前表中只有1行,对不起,explain语句的结果不是很有帮助。

But anyway: I want a more generic answer to this problem, because it occurs very often. 但无论如何:我想对此问题有一个更通用的答案,因为它经常发生。

First of all, regardless of how the records are laid out on disk, you must use ORDER BY to guarantee the order of records from a SELECT . 首先,无论记录在磁盘上的布局方式如何,都必须使用ORDER BY来保证SELECT记录的顺序。 The Optimizer will (usually) notice the order of the records and can decide to do 'nothing' for the ORDER BY . 优化器(通常)会注意到记录的顺序,并且可以决定对ORDER BY不执行任何操作。

In InnoDB, records are arranged according to the PRIMARY KEY . 在InnoDB中,记录是根据PRIMARY KEY排列的。 So, given PRIMARY KEY (a0,b0) and ORDER BY a0, b0 , the Optimizer may simply read the rows in order without having to do a sort. 因此,给定PRIMARY KEY (a0,b0)ORDER BY a0, b0 ,优化器可以简单地按顺序读取行,而不必进行排序。

But... If you have a WHERE clause that, say, says WHERE c0 > 3 and you have INDEX(c0, b0) , the Optimizer is likely to use the index for filtering, then have to sort, even if you say ORDER BY a0, b0 . 但是...如果您有一个WHERE子句,例如WHERE c0 > 3并且您有INDEX(c0, b0) ,则优化器很可能使用索引进行过滤,即使您说ORDER BY a0, b0 ,也必须进行排序ORDER BY a0, b0 This is likely to be faster than doing a table scan (to avoid the sort) and filter as it steps through all the rows (to perform the WHERE ). 这可能比进行表扫描(避免排序)和筛选要快,因为它会逐步遍历所有行(以执行WHERE )。

Your 你的

  1. Select all rows which have the value 4 in field val2. 在字段val2中选择所有值为4的行。
  2. Sort those rows by key1, then by key2, then by key3 按key1,key2,key3对这些行进行排序
  3. Return only the first single row of this sorted set of rows 仅返回此行排序集中的第一行

is very simply, and very efficiently, done via 非常简单,非常有效地通过

INDEX(val2, key1, key2, key3)

SELECT ...
    WHERE val2 = 4                -- filter column goes first
    ORDER BY key1, key2, key3     -- sort columns next
    LIMIT 1

It will read exactly one 'row' from that composite index, then look up the row in the data (using the PRIMARY KEY ). 它将从该复合索引中只读取一个“行”,然后在数据中查找行(使用PRIMARY KEY )。 Both are "point queries", using a BTree index. 两者都是使用BTree索引的“点查询”。 We are talking a few milliseconds, even if nothing is cached, regardless of table size. 不管表大小如何,即使没有缓存任何内容,我们正在谈论几毫秒。

See my cookbook on building indexes. 请参阅我的构建索引手册。

But your 'real' query is not the same pattern; 但是您的“真实”查询不是相同的模式。 it has an 'OR' 它有一个“或”

SELECT  `a0`, `b0`, `n`, `an`, `bn`, `m`, `gele`
    FROM  `numbers`
    WHERE  `gele` = '?'
       OR  `gele` = '='
    ORDER BY  `a0`, `b0`
    LIMIT  1;

INDEX(gele, a0, b0) is tempting, but it won't work. INDEX(gele, a0, b0)很诱人,但无法正常工作。 All the '?' 所有的'?' values are nicely ordered according to a0, b0 , and so are the '=' values. 值根据a0, b0很好地排序,因此'='值也是如此。 But you want both sets. 但是你要两套。 This involves "merging" two sorted lists. 这涉及“合并”两个排序的列表。 The Optimizer has a way to do it, but it is rarely worth the effort. 优化器可以做到这一点,但很少值得付出。 It turns out that there are two possibly 'best' indexes, and the Optimizer cannot always decide correctly between them: 事实证明,存在两个可能的“最佳”索引,而优化器不能始终正确地在它们之间做出决定:

INDEX(gele)  -- do all the filtering; sort later
INDEX(a0,b0) -- avoids sorting, but requires reading an indeterminate number of rows

Since the latter is your PK, and there is some advantage in using the PK, that is what the Optimizer picked. 由于后者是您的PK,因此使用PK有一些优势,这就是Optimizer选择的。 If no '?' 如果不 '?' nor '=' occurs until the 'last' row in the table, the query will read the entire table. 直到表中的“最后”行也不会出现“ =”,查询将读取整个表。 :( :(

One trick that is sometimes worth doing is to turn OR into UNION : 有时值得做的一招是将OR变成UNION

    (  SELECT  `a0`, `b0`, `n`, `an`, `bn`, `m`, `gele`
            FROM  `numbers`
            WHERE  `gele` = '?'
            ORDER BY  `a0`, `b0`
            LIMIT  1 )            -- Step 1, below
UNION ALL
    (  SELECT  `a0`, `b0`, `n`, `an`, `bn`, `m`, `gele`
            FROM  `numbers`
            WHERE  `gele` = '='
            ORDER BY  `a0`, `b0`
            LIMIT  1 )            -- Step 2
ORDER BY  a0, b0 -- yes repeated  -- Step 3
LIMIT  1;                         -- Step 4

INDEX(gele, a0, b0)

This is guaranteed to be fast, but it has some overhead: 这可以保证很快,但是有一些开销:

  1. Search for '?' 搜索 '?' -- find the row promptly. -立即找到该行。 Write to tmp table. 写入tmp表。
  2. Search for '=' -- find the row promptly. 搜索“ =”-立即找到该行。 Append to tmp table. 追加到tmp表。
  3. Sort the tmp table. 排序tmp表。
  4. Peel off 1 row. 剥下1行。

Yes, there is a 'temp' table and 'filesort', but with only 2 rows, it is very fast. 是的,有一个“临时”表和“文件排序”,但是只有两行,所以速度非常快。 This particular formulation works fast regardless of the table size. 不管表的大小如何,此特定公式都可以快速运行。

From the information provided, it's hard to say if there is a better way. 从提供的信息来看,很难说是否有更好的方法。

Given your query: 根据您的查询:

select * from mytable where val2=4 order by key1, key2, key3 limit 1;

The WHERE clause will first restrict the rows to only those containing val2 = 4 before the rest have to be sorted to get the ordering you require. WHERE子句将首先将行限制为仅包含val2 = 4的行,然后必须对其余行进行排序以获取所需的顺序。

Even though you only want one row, you have to sort all the data. 即使只需要一行,也必须对所有数据进行排序。

Only the inclusion of an index on the val2 field will speed up the WHERE part of this. 仅在val2字段中包含索引会加快此操作的WHERE部分。 Other than that, you are at the mercy of the optimiser and the speed of your hardware. 除此之外,您还处于优化器和硬件速度的控制之下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM