简体   繁体   中英

Maximum number of useful indexes a table can have?

The Meeting

In a meeting last week the client was discussing how to make an important search page faster. The page searches on a single table (12 columns, 20 million rows) by asking for values (strings) on any field; it returns 50 rows (with pagination), starting with the specified criteria (each column can be ascending or descending). When the criteria doesn't match the existing indexes, the search becomes slow, and the client is not happy.

And then -- in the middle of the meeting -- the semi-technical analyst threw this one into the air: Why don't we create all possible indexes on the table to make everything fast?

I responded at once "No, there are too many and that would make the table really slow to modify, so we need to create few cleverly chosen indexes to do it". We ended up creating the most useful ones, and the page is now much faster. Problem solved.

The Question

But still... I keep thinking about that question and I wanted to have a better understanding of it, so here it is:

In theory, how many possible useful indexes can I create on a table with N columns?

I think that by useful we should consider (I can be wrong):

  • Indexes not already covered by other ones: for example (a, b) should not be counted if (a, b, c) is included.
  • In order to show multiple rows (not just equality) ascending and descending indexes should be counted as separate ones when they are part of a composite index. That is: (a) serves the same purpose of (a DESC), but (a, b) serves a different purpose than (a DESC, b).

So, a table with a single column (a) can have only a single index:

(a)

With two columns (a, b) I can have four useful indexes:

(a, b)
(b, a)
(a DESC, b)
(b DESC, a)
(a) -- already covered by #1
(b) -- already covered by #2
(a, b DESC) -- already coverred by #1 (reading index in reverse)
(b, a DESC) -- already covered by #2
(a DESC, b DESC) -- already covered by #3
(b DESC, a DESC) -- already covered by #4
(a DESC) -- already covered by #3
(b DESC) -- already covered by #4

With three columns (a, b, c):

(a, b, c)
(a, c, b)
(b, c, a)
(b, a, c)
(c, a, b)
(c, b, a)
...

Let's say you have a table t with columns a, b, and c.

For the query

select a from t where b = 1 order by c;

the best index is on t(b,c,a), because you first look up values using b, then order results by c and have a in the results.

For this query:

select a from t where c = 1 order by b;

the best index is on t(c,b,a).

For this query:

select b from t where c = 1 order by a;

the best index is on t(c,a,b).

With more columns a query could look like this:

select a from t where b = 1 order by c, d, e;

and you'd best want an index on t(b,c,d,e,a).

While for

select a from t where b = 1 order by e, d, c;

you'd want an index on t(b,e,d,c,a).

So the maximal number of useful indexes for n columns is n!, ie all permutations.

This is for indexes on the mere columns alone. As Gordon Linoff has mentioned in the comments section to your request, you may also want function indexes (eg on t(upper(a),lower(b)). The number of usefull function indexes is theoretically unlimited. And yes, Gordon is also right about further index types.

So the final answer is that theoretically the number of useful indexes per table is unlimited.

All the other answers contain something valuable, but there is enough that I have to say about it to warrant a third one.

There is no exact answer to the question like you put it. In a way, it is like asking “What's the limit beyond which you would call a person crazy?” There is a large grey area.

My points are:

  • What would happen if you add too many indexes:

    • Modifying the table gets substantially slower. Even with few indexes, data manipulation will already become an order of magnitude slower. If you ever want to INSERT , UPDATE or DELETE , a table with all conceivable indexes would make such an operation glacially slow.

    • With many indexes, the query planner has to consider many different access paths, so planning the query will become slightly slower with any index you add. With very many indexes, it may well be that the planning overhead will make the query too slow even before the executor has started working.

  • What can you do to reduce the number of indexes needed:

    • Look at the operators. If the operators < , <= , >= and > are never used, there is no point in adding indexes with descending columns.

    • Remember that an index on (a, b, c) can also be used for a query that only uses a in its condition, so you don't need an extra index on (a) .

  • What is a practical way forward for you?

    I have two suggestions:

    1. One way it to add a simple index on each of your twelve columns.

      Twelve indexes are already quite a lot, but you are still not in the crazy range.

      PostgreSQL can use these indexes efficiently in a query with conditions on more than one column, and even if none of the conditions alone would be selective enough to warrant an index scan.

      This is because PostgreSQL has bitmap index scans . See this example from the documentation :

       EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; QUERY PLAN ------------------------------------------------------------------------------------- Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) Recheck Cond: ((unique1 < 100) AND (unique2 > 9000)) -> BitmapAnd (cost=25.08..25.08 rows=10 width=0) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) Index Cond: (unique2 > 9000) 

      Each index is scanned and a bitmap is formed that contains 1 for each row that matches the condition. Then the bitmaps are combined, and finally the rows are fetched from the table.

    2. The other idea is to use a Bloom filter .

      If the only operator in your conditions is = , you can

       CREATE EXTENSION bloom; 

      and create a single index USING bloom over all table columns .

      Such an index can be used for queries with any combination of columns in the WHERE clause. The down side is that it is a lossy index, so you will get false positive results that have to be fetched and filtered out.

      It depends on your case, but this might be an elegant (and underestimated!) solution that balances query and update speed.

In theory, how many possible useful indexes can I create on a table with N columns?

Rather than answering this question theoretically , a practical answer is much better.

The first point to note is that all sequential searches should be avoided (unless the table is very small). By "very small", I mean, just a few rows (say, max 10). (However, even in such a table, a primary key is encouraged, to enforce uniqueness. This would, of course, be implemented as an index.)

Therefore, if the client has a valid search path, an index is required. If an existing index serves the purpose, that's OK; else, in all probability, an additional index is needed.

One transaction table in one application in my experience had 8 indexes. The client insisted on certain search paths, and so we had no choice but to provide them. Of course, we informed the client that updates would slow down, but the client found that acceptable. In reality, the slowdown in speed during updates wasn't appreciable.

So that is the approach suggested - warn the client accordingly.

It is important to verify, during design, that a SQL statement uses indexed search paths (for every accessed table), rather than searching sequentially. ORACLE has a tool for this, called EXPLAIN PLAN . Other DBs should also have similar tools.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM