简体   繁体   中英

SQL Server Clustered Index - Order of Index Question

I have a table like so:

keyA keyB data

keyA and keyB together are unique, are the primary key of my table and make up a clustered index.

There are 5 possible values of keyB but an unlimited number of possible values of keyA,. keyB generally increments.

For example, the following data can be ordered in 2 ways depending on which key column is ordered first:

keyA keyB data
A    1    X
B    1    X
A    3    X
B    3    X
A    5    X
B    5    X
A    7    X
B    7    X

or

keyA keyB data
A    1    X
A    3    X
A    5    X
A    7    X
B    1    X
B    3    X
B    5    X
B    7    X

Do I need to tell the clustered index which of the key columns has fewer possible values to allow it to order the data by that value first? Or does it not matter in terms of performance which is ordered first?

You should order your composite clustered index with the most selective column first. This means the column with the most distinct values compared to total row count.

"B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table." http://www.akadia.com/services/ora_index_selectivity.html ?

This article is for Oracle, but still relevant.

Also, if you have a query that runs constantly and returns few fields, you may consider creating a composite index that contains all the fields - it will not have to access the base table, but will instead pull data from the index.

ligget78's comment on making sure to mention the first column in a composite index is important to remember.

If you create an index (regardless clustered or not) with (keyA, keyB) then this is how values will be ordered, eg first keyA, then keyB (this is the second case in your question). If you want it the other way around, you need to specify (keyB, keyA).

It could matter performance-wise, depends on your query of course. For example, if you have (keyA, keyB) index and the query looks like WHERE keyB = ... (without mentioning keyA) then the index can't be utilized.

As others have said, the ordering is based on how you specify it in the index creation script (or PK constraint). One thing about clustered indexes though is that there is a lot to keep in mind.

You may get better overall performance by using your clustered index on something other than the PK. For example, if you are writing a financial system and reports are almost always based on date and time of an activity (all activity for the past year, etc.) then a clustered index on that date column might be better. As HLGEM says, sorting can also be affected by your selection of clustered index.

Clustered indexes can also affect inserts more than other indexes. If you have a high volume of inserts and your clustered index is on something like an IDENTITY column then there could be contention problems for that particular part of the disk since all of the new rows are being inserted into the same place.

For small look-up tables I always just put the clustered index on the PK. For high-impact tables though it's a good idea to spend the time thinking about (and testing) various possible clustered indexes before choosing the best one.

I believe that SQL Server orders it exactly the way you tell it. It assumes that you know best how to access your index.

In any case, I would say it's a good idea where possible to specify what you want exactly rather than hoping the database will figure it out.

You can also try it both ways, run a bunch of representative queries and then compare the generated execution plans to determine which is best for you.

Remember that the clustered index is the physical order in which the table is stored on disk.

So if your clustered index is defined as ColA, ColB queries will be faster when order in the same order as your clustered index. If SQL has to order B,A it will require post execution sorting to achieve the correct order.

My suggestion is to add a second non-clustered index on B,A. Also depending on the size of your data column to INCLUDE(read included column) it to prevent the need for key lookups. That is, of course, provided that this table is not heavily inserted, as you always must balance query speed vs. write speed.

Realistically, your clustered index should represent the order in which the data is most likely to be accessed as well as maintaining a delicate balance of insert\\update IO cost. If your clustered index is such that you are constantly inserting into the middle of pages, you may suffer performance losses there.

Like others have said, without knowing the table length, column sizes, etc. there is no correct answer. Trial and error with a heavy dose of testing is your best bet.

Just in case this isn't obvious: the sort order of your index does not promise much about the the sort order of the results in a query .

In your queries, you must still add an

ORDER BY KeyA, KeyB

or

ORDER BY KeyB, KeyA

The optimizer may be pleased to find the data already physically ordered in the index as desired and save some time, but every query that is supposed to deliver data in a particular order must have an ORDER BY clause at the end of it. Without an order by, SQL Server makes no promises with respect to the order of a recordset, or even that it will come back in the same order from query to query.

Yes you should suggest, normally query engine try to find out the best execution plan and the index to utilize, however sometime it is better to force query engine to use the specific index. There are some other consideration when planning for index as well as when utilizing the index in your query. for example, the column ordering in index, column ordering in where clause. you could refer following link to know about:

http://ashishkhandelwal.arkutil.com/sql-server/quick-and-short-database-indexes/

  • Best Practices to use indexes
  • How to get best performance form indexes
  • Clustered index Considerations
  • Nonclustered Indexes Considerations

I am sure this will help you when planning for index.

The best thing you can do is to try both solutions and measure the execution time.

In my experience, index tuning is all but exact-science.

Maybe having keyB before keyA in the index column order would be better

You specify the columns in the order in which you would normally want them sorted in reports and queries.

I would be wary of creating a multicolumn clustered index though. Depending on how wide this is, you could have a huge impact on the size of any other indexes you create because all non-clustered indexes contain the clustered index value in them. Also the rows have to be re-ordered if the values frequently change and it is my experience that non-surrogate keys tend to change more frequently. Therefore creating this as a clustered vice nonclustered index could be much more time consuming of server resources if you have values that are likely to change. I'm not saying you shouldn't do this as I don't know what type of data your columns actually contain (although I suspect they are more complex that A1, a2, etc); I'm saying you need to think about the ramifications of doing it. It would probably be a good idea to thoroughly read BOL about clustered vice nonclustered indexes before committing to doing this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM