简体   繁体   English

ORDER BY的索引,其中包含“条件”

[英]Index for an ORDER BY which includes a “condition”

I have a query on a 20M rows table with the following: 我对一个20M行表进行查询,内容如下:

ORDER BY (language_code = '%s') DESC, (language_code = '%s') DESC

%s is replaced at runtime with the actual language codes (the purpose is to order the results so that those in the user language go first, then we have those in the default language and finally the others). %s在运行时被实际的语言代码替换(目的是对结果进行排序,以使使用用户语言的结果排在第一位,然后使用默认语言的结果,最后使用其他语言)。

I have created the following index: 我创建了以下索引:

CREATE INDEX 'index_on_language_code' ON 'my_table' (language_code)

However a query now takes about 10 seconds, which is too much, considering that without the ORDER BY clause it only takes a few milliseconds. 但是,考虑到没有ORDER BY子句,查询只需要几毫秒,因此查询现在大约需要10秒钟,这实在太多了。

Any suggestions for a better index? 有什么更好的索引建议吗?

UPDATE: 更新:

=> EXPLAIN for: SELECT  "localized_skills".* FROM "localized_skills"  ORDER BY (localized_skills.language_code = 'it') DESC, (localized_skills.language_code = 'en') DESC LIMIT 10
QUERY PLAN
Limit  (cost=643126.40..643126.43 rows=10 width=42)
   ->  Sort  (cost=643126.40..678294.56 rows=14067262 width=42)
         Sort Key: (((language_code)::text = 'it'::text)), (((language_code)::text = 'en'::text))
         ->  Seq Scan on localized_skills  (cost=0.00..339137.93 rows=14067262 width=42)
 (4 rows)

UPDATE 2 更新2

Adding WHERE language_code = 'it' OR language_code = 'en' before the ORDER BY (or equivalent solutions) doesn't improve the query in my case. 在我的情况下,在ORDER BY (或等效的解决方案)之前添加WHERE language_code = 'it' OR language_code = 'en'并不能改善查询。 In fact, my data, at the moment , is already only en or it. 事实上,我的数据, 此刻 ,已经是唯一的连接,或者它。 This would prevent an increase in time when I'll add more rows in the database in other languages, but the query won't take less than 10 seconds. 当我将使用其他语言在数据库中添加更多行时,这将避免增加时间,但是查询不会少于10秒。

Your index in not usable in this ORDER BY . 您的索引无法在此ORDER BY If you have fixed strings you could create functional index on for example language_code = 'it' but in this case I suggest you to execute first query with WHERE language_code = 'it' OR language_code = 'en' order this part of query and than execute union with all other languages without order. 如果您有固定的字符串,则可以在例如language_code = 'it'上创建函数索引,但是在这种情况下,我建议您使用WHERE language_code = 'it' OR language_code = 'en'执行第一个查询,对这部分查询进行排序,然后执行与所有其他语言的合并,无须顺序。 You will have same result, but I think much faster. 您将得到相同的结果,但我认为速度要快得多。

SELECT "localized_skills".*
FROM "localized_skills"
ORDER BY (localized_skills.language_code = 'it') DESC,
    (localized_skills.language_code = 'en') DESC
LIMIT 10

The query does not contain a WHERE clause. 该查询不包含WHERE子句。 This means the entire table is read and, in the absence of the LIMIT clause, returned in the result set. 这意味着将读取整个表,并且在缺少LIMIT子句的情况下,将其返回到结果集中。 LIMIT 10 happen at the final stage, after the sorting. LIMIT 10在排序后的最后阶段发生。 It cannot prevent the reading of the entire localized_skills table. 它不能阻止读取整个localized_skills表。

Because of the conditions in the ORDER BY clause the RDBMS cannot use an index. 由于ORDER BY子句中的条件,RDBMS无法使用索引。 It probably creates a temporary table and stores the rows there, maybe creates an index on-the-fly to be able to output the rows in the correct order. 它可能会创建一个临时表并在其中存储行,也许会动态创建索引以能够以正确的顺序输出行。 I don't know the details, I did not work with PostgreSQL but this is how MySQL does and, in fact, there is no way to make it run fast than that. 我不知道细节,我没有使用PostgreSQL但这是MySQL工作方式,实际上,没有办法使其运行得比这快。

Do you really need to use the query as it is now, without a WHERE clause? 您是否真的需要使用查询,而没有WHERE子句? Adding a WHERE clause shrinks the set of rows processed. 添加WHERE子句会缩小处理的行集。

A simple idea (no matter if you add a WHERE clause or not) is to split your query in two queries that move the conditions into the WHERE clause (where they can be used together with indexes to considerably reduce the number of processed rows). 一个简单的想法(无论是否添加WHERE子句)都是将您的查询拆分为两个查询,这些查询将条件移到WHERE子句中(在该子句中可以将这些条件与索引一起使用,以大大减少已处理的行数)。

The first query selects at most 10 rows that have the desired language codes: 第一个查询最多选择10条具有所需语言代码的行:

SELECT "localized_skills".*
FROM "localized_skills"
WHERE localized_skills.language_code IN ('it', 'en')
ORDER BY (localized_skills.language_code = 'it') DESC,
    (localized_skills.language_code = 'en') DESC
LIMIT 10

If the first query returns less than 10 rows then you can run the second query to select the remaining number of items that does not have the desires language codes: 如果第一个查询返回的行数少于10行,则可以运行第二个查询以选择不具有所需语言代码的项的剩余数量:

SELECT "localized_skills".*
FROM "localized_skills"
WHERE localized_skills.language_code NOT IN ('it', 'en')
LIMIT 10               # Put a lower value here if needed

For this second query there is no need to order the rows by the language_code any more (both conditions are FALSE ); 对于第二个查询,不再需要按language_code对行进行排序(两个条件均为FALSE ); this lets PostgreSQL pick the first rows from the table and prevents it reading the entire table. 这使PostgreSQL从表中选择第一行,并阻止它读取整个表。

You can even combine both queries using UNION : 您甚至可以使用UNION组合这两个查询:

(
    SELECT "localized_skills".*
    FROM "localized_skills"
    WHERE localized_skills.language_code IN ('it', 'en')
    LIMIT 10
UNION
    SELECT "localized_skills".*
    FROM "localized_skills"
    WHERE localized_skills.language_code NOT IN ('it', 'en')
    LIMIT 10
)
ORDER BY (localized_skills.language_code = 'it') DESC,
         (localized_skills.language_code = 'en') DESC
LIMIT 10

Again, I don't know about PostgreSQL , this is the correct way to achieve the result using MySQL . 同样,我不了解PostgreSQL ,这是使用MySQL实现结果的正确方法。 I hope it can help you construct the correct query using PostgreSQL syntax and features. 我希望它可以帮助您使用PostgreSQL语法和功能构造正确的查询。

The ORDER BY clause moved from the first inner query to the UNION because MySQL does not preserve the order or rows retrieved by the two inner queries. ORDER BY子句从第一个内部查询移到UNION因为MySQL不会保留两个内部查询检索到的顺序或行。 The LIMIT 10 clauses on the inner queries are needed to avoid scanning the entire table; 需要在内部查询上使用LIMIT 10子句,以避免扫描整个表。 the outer LIMIT 10 clause keeps only the first 10 rows after they are sorted. 外部LIMIT 10子句仅对排序后的前10行进行保留。

https://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html https://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

shows 节目

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. 在某些情况下,MySQL无法使用索引来解析ORDER BY,尽管它仍然使用索引来查找与WHERE子句匹配的行。 These cases include the following: 这些情况包括:

  You use ORDER BY on nonconsecutive parts of a key: SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2; 

This is what you are doing. 这就是你在做什么。

Suggestions from the manual are 手册中的建议是

To increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. 为了提高ORDER BY的速度,请检查是否可以让MySQL使用索引而不是额外的排序阶段。 If this is not possible, you can try the following strategies: 如果这不可能,则可以尝试以下策略:

Increase the sort_buffer_size variable value. 增加sort_buffer_size变量值。

Increase the read_rnd_buffer_size variable value. 增加read_rnd_buffer_size变量值。

Use less RAM per row by declaring columns only as large as they need to be to hold the values stored in them. 通过仅声明与保留存储在其中的值所需大小相同的列,每行使用较少的RAM。 For example, CHAR(16) is better than CHAR(200) if values never exceed 16 characters. 例如,如果值不超过16个字符,则CHAR(16)优于CHAR(200)。

Change the tmpdir system variable to point to a dedicated file system with large amounts of free space. 更改tmpdir系统变量,使其指向具有大量可用空间的专用文件系统。 The variable value can list several paths that are used in round-robin fashion; 变量值可以列出以循环方式使用的几个路径。 you can use this feature to spread the load across several directories. 您可以使用此功能将负载分散到多个目录中。 Paths should be separated by colon characters (“:”) on Unix and semicolon characters (“;”) on Windows, NetWare, and OS/2. 路径应在Unix上用冒号(“:”)分隔,在Windows,NetWare和OS / 2上应用分号(“;”)分隔。 The paths should name directories in file systems located on different physical disks, not different partitions on the same disk. 路径应命名位于不同物理磁盘上的文件系统中的目录,而不是同一磁盘上的不同分区。

Alternatively it may be done by 或者,可以通过

{query}
WHERE language_code = '%s'
UNION
{query}
WHERE language_code = '%s'
UNION
{query}
WHERE language_code NOT IN( '%1$s', '%2$s')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM