I have 2 tables resulted from merging the following tables:
Authors
-Aid bigint
-Surname nvarchar(500)
-Email nvarchar(500)
Articles
-ArId varchar(50)
-Year int
-……Some other fields……
ArticleAuthors
-ArId varchar(50)
-Aid bigint
Classifications
-ClassNumber int
-ClassDescription nvarchar(100)
ClassArticles
-ArId varchar(50)
-ClassNumber int
After denormalizing these tables the resulted tables were:
Articles
-FieldId int
-ArId varchar(50)
-ClassNumber int (Foreign key from the Classifications table)
-Year int
Authors
-FieldId int
-ArId varchar(50) (Foreign key from the Articles table)
-Aid bigint
-Surname nvarchar(500)
-Email nvarchar(500)
-Year int
Here are the conditions of the data within the resulted tables:
The question is:
We need to create a stored procedure that gets the following result from the two tables [Aid,Surname,Email] under the following conditions:
Authors that have written articles during and after a specific year (AND)
We need the query to take the least possible time
If anyone can help, Thank you very much.
Without having the data this is very difficult to work on, but I created the tables and duplicated the procedure to get a rough idea on the query plan and potential problems.
First noticable thing, the part of the query written as :
SELECT DISTINCT Aid
FROM Authors EAE
WHERE EAE.[Year] >= @year AND EAE.Email IS NOT NULL AND EAE.Email != ' '
Is going to table scan, you have Year as your partitioning key, but within each partition there is no index supporting the email clauses in the query. As a side note the EAE.Email != ' ' might not give you quite what you expect. If ' ' != '' print 'true' else print 'false' That will give false for most systems. (Based on ansi padding)
FROM Articles ED
INNER JOIN Authors EAD ON EAD.ArId = ED.ArId
WHERE EAD.Aid = [YearAuthors].Aid AND ED.ClassNumber = @classNumber
ED.ClassNumber will have no supporting index, causing a clustered index scan.
In the final select statement : INNER JOIN Authors EA ON EA.Aid = #TT.Aid
This has no supporting index on the #TT side and doesn't appear to be one on the Authors Table side.
WHERE EA.Email IS NOT NULL AND EA.Email != ' '
this has no supporting index, causing a scan.
There are a lot more issues in there, with a considerable number of sort's appearing that probably will disappear with suitable indexes - You will have to sort out some of the basic indexing on the tables and then get a new query plan / set of problems and iteratively fix the plan - you will not fix it in a single 'silver bullet' shot.
Do you want help writing the query, or help in fixing the performance? The query itself should be relatively simple. That's not where you're going to get the most bang for your buck.
SQL Server comes with tools for analyzing queries and boosting performance by tuning your indexes. That's where you're going to see the biggest help in getting it to run quickly.
First step would be appropriate indexes . With the where criteria being the primary contenders and then items not used in where but selected can simply be included in the index. As mentioned there are standard tools and queries to find these.
To focus on the query in hand run the Query with "Query | Include Execution Plan " (Ctrl + M) turned on. This should show up any obvious bottlenecks.
Given the beforementioned conditions This is the query I have created but it takes 3 minutes (So long time for a web page response):
CREATE PROC [dbo].[GetAuthorForMailing]
(
@classNumber INT,
@noPapers int,
@year int,
@percent int
)
AS
BEGIN
CREATE TABLE #TT
(
Aid bigint,
allPapers int,
classPapers int,
perc as CEILING(CAST(classPapers AS DECIMAL) / CAST(allPapers AS DECIMAL) * 100)
)
INSERT INTO #TT(Aid,allPapers,classPapers)
SELECT [YearAuthors].Aid,
(
SELECT COUNT(EA.Aid)
FROM Authors EA
WHERE EA.Aid =[YearAuthors].Aid) AS [AllPapers],
(
SELECT COUNT(*)
FROM Articles ED INNER JOIN Authors EAD ON EAD.ArId = ED.ArId
WHERE EAD.Aid = [YearAuthors].Aid AND ED.ClassNumber = @classNumber) AS [ClassPapers]
FROM
(
SELECT DISTINCT Aid
FROM Authors EAE
WHERE EAE.[Year] >= @year AND EAE.Email IS NOT NULL AND EAE.Email != ' '
)AS [YearAuthors]
SELECT DISTINCT EA.Aid,EA.Surname,EA.Email,[Year]
FROM #TT INNER JOIN Authors EA ON EA.Aid = #TT.Aid
AND allPapers > @noPapers
AND perc > @percent
AND EA.[Year] = (SELECT MAX([Year]) FROM Authors WHERE Aid = EA.Aid)
WHERE EA.Email IS NOT NULL AND EA.Email != ' '
DROP TABLE #TT
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.