How can I improve the speed of a SQL query searching for a collection of strings

Question

I have a table called T_TICKET with a column CallId varchar(30) .

Here is an example of my data:

CallId               | RelatedData
===========================================
MXZ_SQzfGMCPzUA      | 0000
MXyQq6wQ7gVhzUA      | 0001
MXwZN_d5krgjzUA      | 0002
MXw1YXo7JOeRzUA      | 0000
...

I am attempting to find records that match a collection of CallId 's. Something like this:

SELECT * FROM T_TICKET WHERE CALLID IN(N'MXZInrBl1DCnzUA', N'MXZ0TWkUhHprzUA', N'MXZ_SQzfGMCPzUA', ... ,N'MXyQq6wQ7gVhzUA')

And I have anywhere from 200 - 300 CallId 's that I am looking up at a time using this query. The query takes around 35 seconds to run. Is there anything I can do to either the table structure, the column type, the index, or the query itself to improve the performance of this query?

There are around 300,000 rows in T_INDEX currently. CallId is not unique. And RelatedData is not unique. I also have an index (non-clustered) on CallId .

I know the basics of SQL, but I'm not a pro. Some things I've thought of doing are:

Change the type of CallId from varchar to char .
Shorten the length of CallId (it's length is 30, but in reality, right now, I am using only 15 bytes).

I have not tried any of these yet because it requires changes to live production data. And, I am not sure they would make a significant improvement.

Would either of these options make a significant improvement? Or, are there other things I could do to make this perform faster?

Answer 1

First, be sure that the types are the same -- either VARCHAR() or NVARCHAR() . Then, add an index:

create index idx_t_ticket_callid on t_ticket(callid);

If the types are compatible, SQL Server should make use of the index.

Answer 2

Your table is what we called heap (a table without clustered index) . This kind of tables only good for data loading and/or as staging table. I would recommend you to convert your table to have a clustered key. A good clustering key should be unique, static, narrow, non-nullable, and ever-increasing (eg. int / bigint identity datatype).

Another downside of heap is when you have lots of UPDATE / DELETE on your table, it will slow down your SELECT because of forwarded records. Quoting from Paul Randal about forwarded records:

If a forwarding record occurs in a heap, when the record locator points to that location, the Storage Engine gets there and says Oh, the record isn't really here – it's over there! And then it has to do another (potentially physical) I/O to get to the page with the forwarded record on. This can result in a heap being less efficient that an equivalent clustered index.

Lastly, make sure you define all your columns on your SELECT . Avoid the SELECT * . I'm guessing you are experiencing a table scan when you execute the query. What you can do is INCLUDE all columns list on your SELECT on your index like this:

CREATE INDEX [IX_T_TICKET_CallId_INCLUDE] ON [T_TICKET] ([CallId]) INCLUDE ([RelatedData]) WITH (DROP_EXISTING=ON)

Answer 3

It turns out there is in fact a way to significantly optimize my query without changing any data types.

This query:

SELECT * FROM T_TICKET 
WHERE CALLID IN(N'MXZInrBl1DCnzUA', N'MXZ0TWkUhHprzUA', N'MXZ_SQzfGMCPzUA', ... ,N'MXyQq6wQ7gVhzUA')

is using NVARCHAR types as the input params (N'MXZInrBl1DCnzUA', N'MXZ0TWkUhHprzUA'...) . As I specified in my question, CallId is VARCHAR . Sql Server was converting CallId in every row of the table to an NVARCHAR type to do the comparison, which was taking a long time (even though I have an index on CallId ).

I was able to optimize it by simply NOT changing the parameter types to NVARCHAR :

SELECT * FROM T_TICKET 
WHERE CALLID IN('MXZInrBl1DCnzUA', 'MXZ0TWkUhHprzUA', 'MXZ_SQzfGMCPzUA', ... ,'MXyQq6wQ7gVhzUA')

Now, instead of taking over 30 seconds to run, it only takes around .03 seconds. Thanks for all the input.

How can I improve the speed of a SQL query searching for a collection of strings

Question

3 answers

solution1
1 2018-06-26 21:40:35

solution2
1

solution3
0 2018-06-27 02:27:32

How can I improve the speed of a SQL query searching for a collection of strings

Question

3 answers

solution1 1 2018-06-26 21:40:35

solution2 1

solution3 0 2018-06-27 02:27:32

solution1
1 2018-06-26 21:40:35

solution2
1

solution3
0 2018-06-27 02:27:32