简体   繁体   中英

Details of impact for indexes, primary keys, unique keys

I like to think I know enought theory, but I have little experience optimizing DB in real world. I would like to know points of view, thoughts or experiences.

Let's imagine a scenario like:

Table A Key: c1, c2, c3, c4 Index: c7, c3, c2

Table B Key: c1, c2, c3, c4 Index: c1, c5

All are non-clustered. The tables have 40+ fields. They are feeded daily by night and have some updates during the day.

Table A, if more queries benefit from Key than Index, could the index impact negatively? Because insert/delete has to update 2 indexes instead of 1.

Table B, has an extra field at the index, not present in key.

Could a query using c1, c5

Benefit from this Key?: Key: c1, c2, c3, c4, c5

So that Index could be droped.

What impact does the order of the fields has? Key: c1, c2, c3 Key: c3, c1, c2

A typical scenario for me is process_date, client_number, operation. And it feeds with a bunch of data each day (process_date).

If more queries benefit from Key than Index, could the index impact negatively?

Yes.

But...

You have to actually measure your actual workload to see if this is true. It's hard to predict from purely theoretical grounds. Possible, but hard.

Could a query using c1, c5 Benefit from this Key?: Key: c1, c2, c3, c4, c5

Rarely. The query algorithms rarely make use of part of a key or index. It's usually all or nothing. If the whole key (or index) can't be used, then none of it is used.

It's easy to get a query execution plan and get a definite answer to this question.

Learn to to get the execution plans and actually get them.

What impact does the order of the fields has? Key: c1, c2, c3 Key: c3, c1, c2

Almost no impact at all. In some databases, it can change how the rows appear if you omit an ORDER BY clause. In other databases, it has no impact because the physical rows and the order of the key index are separate.

You can easily drop and recreate the index, check execution plans, and see what -- if any -- impact this has.

The only way to be sure is to get the execution plans and look at them.

If you're trying to optimize your indexing strategy, you can run the Database Tuning Advisor or query sys.dm_db_index_usage_stats to find out how your indexes are being used. (These stats are cleared when the SQL Server process is restarted).

In answer to your first question, indexes always negatively impact write performance, so it can be useful to weed out unused ones. But make sure they are really unused first.

In answer to your second question, it's unlikely that removing the index will increase the query performance, and in fact may decrease it, since the index is covering the query with the minimal number of columns. If by "using c1, c5" means that those are the only two columns involved, then removing the index will likely reduce performance slightly. If the query is selecting more columns than are in the index, then the index has to join to the table anyway if it gets used, so it's unlikely to affect performance much if you drop it. The only way to be sure is to analyze the query cost for both scenarios.

if more queries benefit from Key than Index, could the index impact negatively? Because insert/delete has to update 2 indexes instead of 1.

A non-clustered index has a negative impact on insert/update/delete performance. The negative impact is usually outweighed by the increased performance of selects.

Could a query using c1, c5 Benefit from this Key?: Key: c1, c2, c3, c4, c5

Yes, if only a few rows share the same c1, then the index will be very effective.

What impact does the order of the fields has? Key: c1, c2, c3 Key: c3, c1, c2

Order is important, both for filtering and ordering. An index on (c1,c2) can be used for where c1 = 1 and where c1 = 1 and c2 = 1 , but not for where c2 = 1 . Likewise, it helps with order by c1 , but not with order by c2 .

Indexes can certainly have a negative impact on table changes (either inserts, updates, or deletes). Unless you have a large number of unnecessary indexes or your system has an extremely high rate of data changes one or two extra indexes is unlikely to be a problem. Don't just put indexes on a table for no reason, but you usually don't have to go through extensive analysis to try to avoid an index.

The key on your table should be the minimum set of columns which uniquely identifies a row. It should not contain additional columns. For example, if I have a table of unique email addresses and my key is on email_address then I can only have a single row for "me@here.com". If I now add the description to the key because I use the description in a lot of my queries then suddenly I could have: "me@here.com", "Description #1" AND "me@here.com", "Description #2". Your data is no longer being properly constrained and you end up with a big mess on your hands.

The ordering of the columns in your key can have a very significant impact depending on the queries which you're using. For example, if you're looking for rows based on c1, c2 then a key of c1, c2, c3 would be very useful. If your key is c1, c3, c2 then it's not so useful.

Imagine it this way, I want you to look in the phone book and find everyone whose last name begins with "TO". That's a pretty simple request. Now, what if the names were ordered by the first letter of the last name followed by the third letter of the last name? It would be very difficult and time-consuming to find those names starting with "TO".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM