I have a table called T_TICKET
with a column CallId varchar(30)
.
Here is an example of my data:
CallId | RelatedData
===========================================
MXZ_SQzfGMCPzUA | 0000
MXyQq6wQ7gVhzUA | 0001
MXwZN_d5krgjzUA | 0002
MXw1YXo7JOeRzUA | 0000
...
I am attempting to find records that match a collection of CallId
's. Something like this:
SELECT * FROM T_TICKET WHERE CALLID IN(N'MXZInrBl1DCnzUA', N'MXZ0TWkUhHprzUA', N'MXZ_SQzfGMCPzUA', ... ,N'MXyQq6wQ7gVhzUA')
And I have anywhere from 200 - 300 CallId
's that I am looking up at a time using this query. The query takes around 35 seconds to run. Is there anything I can do to either the table structure, the column type, the index, or the query itself to improve the performance of this query?
There are around 300,000 rows in T_INDEX
currently. CallId
is not unique. And RelatedData
is not unique. I also have an index (non-clustered) on CallId
.
I know the basics of SQL, but I'm not a pro. Some things I've thought of doing are:
CallId
from varchar
to char
. CallId
(it's length is 30, but in reality, right now, I am using only 15 bytes). I have not tried any of these yet because it requires changes to live production data. And, I am not sure they would make a significant improvement.
Would either of these options make a significant improvement? Or, are there other things I could do to make this perform faster?
First, be sure that the types are the same -- either VARCHAR()
or NVARCHAR()
. Then, add an index:
create index idx_t_ticket_callid on t_ticket(callid);
If the types are compatible, SQL Server should make use of the index.
Your table is what we called heap (a table without clustered index) . This kind of tables only good for data loading and/or as staging table. I would recommend you to convert your table to have a clustered key. A good clustering key should be unique, static, narrow, non-nullable, and ever-increasing (eg. int
/ bigint
identity datatype).
Another downside of heap is when you have lots of UPDATE
/ DELETE
on your table, it will slow down your SELECT
because of forwarded records. Quoting from Paul Randal about forwarded records:
If a forwarding record occurs in a heap, when the record locator points to that location, the Storage Engine gets there and says Oh, the record isn't really here – it's over there! And then it has to do another (potentially physical) I/O to get to the page with the forwarded record on. This can result in a heap being less efficient that an equivalent clustered index.
Lastly, make sure you define all your columns on your SELECT
. Avoid the SELECT *
. I'm guessing you are experiencing a table scan
when you execute the query. What you can do is INCLUDE
all columns list on your SELECT
on your index like this:
CREATE INDEX [IX_T_TICKET_CallId_INCLUDE] ON [T_TICKET] ([CallId]) INCLUDE ([RelatedData]) WITH (DROP_EXISTING=ON)
It turns out there is in fact a way to significantly optimize my query without changing any data types.
This query:
SELECT * FROM T_TICKET
WHERE CALLID IN(N'MXZInrBl1DCnzUA', N'MXZ0TWkUhHprzUA', N'MXZ_SQzfGMCPzUA', ... ,N'MXyQq6wQ7gVhzUA')
is using NVARCHAR
types as the input params (N'MXZInrBl1DCnzUA', N'MXZ0TWkUhHprzUA'...)
. As I specified in my question, CallId
is VARCHAR
. Sql Server was converting CallId
in every row of the table to an NVARCHAR
type to do the comparison, which was taking a long time (even though I have an index on CallId
).
I was able to optimize it by simply NOT changing the parameter types to NVARCHAR
:
SELECT * FROM T_TICKET
WHERE CALLID IN('MXZInrBl1DCnzUA', 'MXZ0TWkUhHprzUA', 'MXZ_SQzfGMCPzUA', ... ,'MXyQq6wQ7gVhzUA')
Now, instead of taking over 30 seconds to run, it only takes around .03 seconds. Thanks for all the input.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.