Need tips for optimizing SQL Query using a JOIN

Question

The Query I'm writing runs fine when looking at the past few days, once I go over a week it crawls (~20min). I am joining 3 tables together. I was wondering what things I should look for to make this run faster. I don't really know what other information is needed for the post.

EDIT: More info: db is Sybase 10. Query:

SELECT a.id, a.date, a.time, a.signal, a.noise,
b.signal_strength, b.base_id, b.firmware,
a.site, b.active, a.table_key_id
FROM adminuser.station AS a
JOIN adminuser.base AS b
ON a.id = b.base_id
WHERE a.site = 1234 AND a.date >= '2009-03-20'

I also took out the 3rd JOIN and it still runs extremely slow. Should I try another JOIN method?

Answer 1

You can get a lot of information (assuming you're using MSSQL here) by running your query in SQL Server Management Studio with the Include Actual Execution Plan option set (in the Query menu).

This will show you a diagram of the steps that SQLServer performs in order to execute the query - with relative costs against each step.

The next step is to rework the query a little (try doing it a different way) then run the new version and the old version at the same time. You will get two execution plans, with relative costs not only against each step, but against the two versions of the query! So you can tell objectively if you are making progress.

I do this all the time when debugging/optimizing queries.

Answer 2

I don't know Sybase 10 that well, but try running that query for say 10-day period and then 10 times, for each day in a period respectively and compare times. If the time in the first case is much higher, you've probably hit the database cache limits.

The solution is than to simply run queries for shorter periods in a loop (in program, not SQL). It works especially well if table A is partitioned by date.

Answer 3

确保您在外键上有索引。

Answer 4

It sounds more like you have a memory leak or aren't closing database connections in your client code than that there's anything wrong with the query.

[edit]
Nevermind: you mean quering over a date range rather than the duration the server has been active. I'll leave this up to help others avoid the same confusion.

Also, it would help if you could post the sql query, even if you need to obfuscate it some first, and it's a good bet to check if there's an index on your date column and the number of records returned by the longer range.

Answer 5

You may want to look into using a PARTITION for the date ranges, if your DB supports it. I've heard this can help significantly.

Answer 6

拿起《专业SQL Server 2005性能调优》这本书非常棒。

Answer 7

You didn't mention your database. If it's not SQL Server, the specifics of how to get the data might be different, but the advice is fundamentally the same.

Look at indexing, for sure, but the first thing to do is to follow Blorgbeard's advice and scan for execution plans using Management Studio (again, if you are running SQL Server).

What I'm guessing you'll see is that for small date ranges, the optimizer picks a reasonable query plan, but that when the date range is large, it picks something completely different, likely involving either table scans or index scans, and possibly joins that lead to very large temporary recordsets. The execution plan analyzer will reveal all of this.

A scan means that the optimizer thinks that grinding over the whole table or the whole index is cheaper for what you are trying to do than seeking specific values.

What you eventually want to do is get indexes and the syntax of your query set up such that you keep index seeks in the query plan for your query regardless of the date range, or, failing that, that the scans you require are filtered as well as you can manage to minimize temporary recordset size and thereby avoid excessive reads and I/O.

Answer 8

SELECT

 a.id, a.date, a.time, a.signal, a.noise,a.site, b.active, a.table_key_id,
 b.signal_strength, b.base_id, b.firmware

FROM 

( SELECT * FROM adminuser.station 
      WHERE site = 1234 AND date >= '2009-03-20') AS a
JOIN 

    adminuser.base AS b
ON

    a.id = b.base_id

Kind of rewrote the query, so as to first filter the desired rows then perform a join rather than perform a join then filter the result.

Rather than pulling * from the sub-query you can just select the columns you want, which might be little helpful.

May be this will of little help, in speeding things.

While this is valid in MySql, I am not sure of the sysbase syntax though.

Need tips for optimizing SQL Query using a JOIN

Question

8 answers

solution1
2 2009-03-31 03:16:14

solution2
2 ACCPTED 2009-03-31 13:20:05

solution3
1 2009-03-31 03:45:37

solution4
0 2009-03-31 03:05:27

solution5
0 2009-03-31 03:11:44

solution6
0 2009-03-31 03:59:07

solution7
0 2009-03-31 12:40:47

solution8
0 2009-03-31 21:52:07

Need tips for optimizing SQL Query using a JOIN

Question

8 answers

solution1 2 2009-03-31 03:16:14

solution2 2 ACCPTED 2009-03-31 13:20:05

solution3 1 2009-03-31 03:45:37

solution4 0 2009-03-31 03:05:27

solution5 0 2009-03-31 03:11:44

solution6 0 2009-03-31 03:59:07

solution7 0 2009-03-31 12:40:47

solution8 0 2009-03-31 21:52:07

solution1
2 2009-03-31 03:16:14

solution2
2 ACCPTED 2009-03-31 13:20:05

solution3
1 2009-03-31 03:45:37

solution4
0 2009-03-31 03:05:27

solution5
0 2009-03-31 03:11:44

solution6
0 2009-03-31 03:59:07

solution7
0 2009-03-31 12:40:47

solution8
0 2009-03-31 21:52:07