Increase MS SQL transaction performance

Question

If you're missing information, I will attach them if requested.

Workspace

I have a database running on a MS SQL 2012 standard edition of this kind:

tables:
- users(id, softId (not unique), birthdate)
  - rows: 10.5 million
  - indexes: all three columns, birthdate(clustered)
- docs(docId, userId, creationDate, deleteDate, lastname, forename, classificationId)
  - rows: 23 million
  - indexes: lastname, forename, docId, creationDate, userID(clustered)
  - notice: in this specific case the names are related to the docs, not to the userId
- classifications(id, description)
  - rows: 200
- three tables "data"
  - rows: 10, 13 and 0.3 million
  - indexes: docIds
relations:
- users to docs: 1 to n
- classifications to docs: 1 to n
- docs to data-tables: 1 to n

To select the complete records I am actually on following statements:

Server-Execution-Time 16 seconds

SELECT * FROM (
    select * from docs 
    where userID in (
        select distinct userID from users where softId like '...'
    )
) as doc
LEFT JOIN users on users.userID = doc.userId
LEFT JOIN classifications on classifications.id = doc.classificationId
LEFT JOIN data1 on data1.docId = doc.docId
LEFT JOIN data2 on data2.docId = doc.docId
LEFT JOIN data3 on data3.docId = doc.docId;

Updated - now 15 seconds

SELECT
docID, calssificationId, classificationDescription,
userId, softId, forename, lastname, birthdate,
data1.id, data1.date, data2.id, data2.date, data3.id, data3.date,
FROM docs
JOIN users on users.userID = doc.userId AND softId like '...'
LEFT JOIN classifications on classifications.id = doc.classificationId
LEFT JOIN data1 on data1.docId = doc.docId
LEFT JOIN data2 on data2.docId = doc.docId
LEFT JOIN data3 on data3.docId = doc.docId;

execution plans

execution plan

Server-Execution-Time 17 seconds

DECLARE @userIDs table( id bigint );
DECLARE @docIDs table( id bigint );

insert into @userIDs select userID from users where softId like '...';
insert into @docIDs select docId from docs where userId in ( select id from @userIDs);
SELECT * FROM users where userID in ( select id from @userIDs);
SELECT * FROM docs where docID in (select id from @docIDs);
SELECT * FROM data1 where data1.docId in (select id from @docIDs);
SELECT * FROM data2 where data2.docId in (select id from @docIDs);
SELECT * FROM data3 where data3.docId in (select id from @docIDs);
GO

Updated - now 14 seconds

DECLARE @userIDs table( id bigint, softId varchar(12), birthdate varchar(8) );
DECLARE @docIDs table( id bigint, classification bigint, capture_date datetime, userId bigint, lastname varchar(50), forename varchar(50) );


INSERT INTO @userIDs select userID, softId, birthdate from users where softId like '...';
INSERT INTO @docIDs select docID, classification, capture_date, userID, lastname, forename from docs where userID in ( select id from @userIDs);

SELECT * FROM @userIDs;
SELECT * FROM @docIDs;

SELECT [only needed fields] FROM data1 where docID in (select id from @docIDs);
SELECT [only needed fields] FROM data2 where docID in (select id from @docIDs);
SELECT [only needed fields] FROM data3 where docID in (select id from @docIDs);

execution plans

General Updates @AntonínLejsek suggested to define the docId of documents as a clustered index and the pkId as non-clustered. This changed the execution-time as follow:

Join-Statement: -1 second
Multi-Select-Statement: -5 seconds

I checked the indexes again and changed the included columns, now they have the execution-time:

Join-Statement: 4 seconds
Multi-Select-Statement: 6 seconds

The "simple" question

Do somebody have suggestions to reduce the executiontime?

Answer 1

I would phrase the logic as:

I would get rid of the first subquery and just do the necessary work on the users table:

SELECT *
FROM docs JOIN
     users
     ON users.userID = doc.userId AND softId LIKE '...'  LEFT JOIN
     . . .

The logic in the IN is unnecessary if you are doing a JOIN anyway.

Note: This might not help much, because your query appears to be returning lots of data, both in columns and rows.

Answer 2

I see two different databases in the plan, I would try to test it in one database first.

The database design is weird. You have clustered index on birthdate. As it is not unique, database has to make up another 4B number for making it unique. So You have 12B key in every nonclustered index, which is space and performance inefficient. You do not even have id included in the nonclustered index, so it has to be looked up, which is time wasting. In most cases You should cluster on primary key and that should be id.

--Deleted-- while softIds is almost unique, this paragraph became irelevant.

Answer 3

Add clustered indexes on your table variables by defining primary keys.

DECLARE @userIDs table( id bigint primary key, softId varchar(12), birthdate varchar(8) );
DECLARE @docIDs table( id bigint primary key, classification bigint, capture_date datetime, userId bigint, lastname varchar(50), forename varchar(50) );

Increase MS SQL transaction performance

Question

3 answers

solution1
2 2016-10-06 12:23:03

solution2
1 2016-10-06 15:15:05

solution3
0 2016-10-06 15:01:43

Increase MS SQL transaction performance

Question

3 answers

solution1 2 2016-10-06 12:23:03

solution2 1 2016-10-06 15:15:05

solution3 0 2016-10-06 15:01:43

solution1
2 2016-10-06 12:23:03

solution2
1 2016-10-06 15:15:05

solution3
0 2016-10-06 15:01:43