简体   繁体   中英

Index for an ORDER BY which includes a “condition”

I have a query on a 20M rows table with the following:

ORDER BY (language_code = '%s') DESC, (language_code = '%s') DESC

%s is replaced at runtime with the actual language codes (the purpose is to order the results so that those in the user language go first, then we have those in the default language and finally the others).

I have created the following index:

CREATE INDEX 'index_on_language_code' ON 'my_table' (language_code)

However a query now takes about 10 seconds, which is too much, considering that without the ORDER BY clause it only takes a few milliseconds.

Any suggestions for a better index?

UPDATE:

=> EXPLAIN for: SELECT  "localized_skills".* FROM "localized_skills"  ORDER BY (localized_skills.language_code = 'it') DESC, (localized_skills.language_code = 'en') DESC LIMIT 10
QUERY PLAN
Limit  (cost=643126.40..643126.43 rows=10 width=42)
   ->  Sort  (cost=643126.40..678294.56 rows=14067262 width=42)
         Sort Key: (((language_code)::text = 'it'::text)), (((language_code)::text = 'en'::text))
         ->  Seq Scan on localized_skills  (cost=0.00..339137.93 rows=14067262 width=42)
 (4 rows)

UPDATE 2

Adding WHERE language_code = 'it' OR language_code = 'en' before the ORDER BY (or equivalent solutions) doesn't improve the query in my case. In fact, my data, at the moment , is already only en or it. This would prevent an increase in time when I'll add more rows in the database in other languages, but the query won't take less than 10 seconds.

Your index in not usable in this ORDER BY . If you have fixed strings you could create functional index on for example language_code = 'it' but in this case I suggest you to execute first query with WHERE language_code = 'it' OR language_code = 'en' order this part of query and than execute union with all other languages without order. You will have same result, but I think much faster.

SELECT "localized_skills".*
FROM "localized_skills"
ORDER BY (localized_skills.language_code = 'it') DESC,
    (localized_skills.language_code = 'en') DESC
LIMIT 10

The query does not contain a WHERE clause. This means the entire table is read and, in the absence of the LIMIT clause, returned in the result set. LIMIT 10 happen at the final stage, after the sorting. It cannot prevent the reading of the entire localized_skills table.

Because of the conditions in the ORDER BY clause the RDBMS cannot use an index. It probably creates a temporary table and stores the rows there, maybe creates an index on-the-fly to be able to output the rows in the correct order. I don't know the details, I did not work with PostgreSQL but this is how MySQL does and, in fact, there is no way to make it run fast than that.

Do you really need to use the query as it is now, without a WHERE clause? Adding a WHERE clause shrinks the set of rows processed.

A simple idea (no matter if you add a WHERE clause or not) is to split your query in two queries that move the conditions into the WHERE clause (where they can be used together with indexes to considerably reduce the number of processed rows).

The first query selects at most 10 rows that have the desired language codes:

SELECT "localized_skills".*
FROM "localized_skills"
WHERE localized_skills.language_code IN ('it', 'en')
ORDER BY (localized_skills.language_code = 'it') DESC,
    (localized_skills.language_code = 'en') DESC
LIMIT 10

If the first query returns less than 10 rows then you can run the second query to select the remaining number of items that does not have the desires language codes:

SELECT "localized_skills".*
FROM "localized_skills"
WHERE localized_skills.language_code NOT IN ('it', 'en')
LIMIT 10               # Put a lower value here if needed

For this second query there is no need to order the rows by the language_code any more (both conditions are FALSE ); this lets PostgreSQL pick the first rows from the table and prevents it reading the entire table.

You can even combine both queries using UNION :

(
    SELECT "localized_skills".*
    FROM "localized_skills"
    WHERE localized_skills.language_code IN ('it', 'en')
    LIMIT 10
UNION
    SELECT "localized_skills".*
    FROM "localized_skills"
    WHERE localized_skills.language_code NOT IN ('it', 'en')
    LIMIT 10
)
ORDER BY (localized_skills.language_code = 'it') DESC,
         (localized_skills.language_code = 'en') DESC
LIMIT 10

Again, I don't know about PostgreSQL , this is the correct way to achieve the result using MySQL . I hope it can help you construct the correct query using PostgreSQL syntax and features.

The ORDER BY clause moved from the first inner query to the UNION because MySQL does not preserve the order or rows retrieved by the two inner queries. The LIMIT 10 clauses on the inner queries are needed to avoid scanning the entire table; the outer LIMIT 10 clause keeps only the first 10 rows after they are sorted.

https://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

shows

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

  You use ORDER BY on nonconsecutive parts of a key: SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2; 

This is what you are doing.

Suggestions from the manual are

To increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:

Increase the sort_buffer_size variable value.

Increase the read_rnd_buffer_size variable value.

Use less RAM per row by declaring columns only as large as they need to be to hold the values stored in them. For example, CHAR(16) is better than CHAR(200) if values never exceed 16 characters.

Change the tmpdir system variable to point to a dedicated file system with large amounts of free space. The variable value can list several paths that are used in round-robin fashion; you can use this feature to spread the load across several directories. Paths should be separated by colon characters (“:”) on Unix and semicolon characters (“;”) on Windows, NetWare, and OS/2. The paths should name directories in file systems located on different physical disks, not different partitions on the same disk.

Alternatively it may be done by

{query}
WHERE language_code = '%s'
UNION
{query}
WHERE language_code = '%s'
UNION
{query}
WHERE language_code NOT IN( '%1$s', '%2$s')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM