Simple SQL query is taking 20 minutes to run?

Question

I have a query that outputs a list of percentages based on a total number, the only part I cant figure out is an efficient method to filter the 'usid' equal to a value on another table.
The query is not failing but is taking a very long time to complete.

    SELECT badge, count(usid)*100 / (SELECT COUNT(DISTINCT usid) from Table1)
    FROM Table1
    WHERE usid IN(
        SELECT usid
        FROM Table2
        WHERE msid = 1
        )
    GROUP BY badge

The output looks something like this

    -----------------------------
    badge        count
    -----------------------------
    1            65.1
    2            45.4
    3            22.7
    4            12.12

The usid that it is counting I am trying to set equal to the usid WHERE msid = 1. Even if this method works it takes far too long. any ideas for a work around?

Answer 1

You should be able to use explicit JOIN notation instead of the IN clause:

SELECT a.badge, COUNT(a.usid)*100 / (SELECT COUNT(DISTINCT usid) from Table1)
  FROM Table1 AS a
  JOIN (SELECT DISTINCT usid FROM Table2 WHERE msid = 1) AS b ON a.usid = b.usid
 GROUP BY a.badge

However, I'm not confident that will fix the performance problem. A half-way decent optimizer will realize that the sub-select in the select-list is constant, but you should verify that the optimizer is half-way decent (or better) by looking at the query plan.

I'm not convinced that the COUNT(a.usid) does anything different from COUNT(*) in this context. It would produce a different answer only if a.usid could contain nulls. See also COUNT(*) vs COUNT(1) vs COUNT(pk) — which is better?

Answer 2

This is not such a simple query. Depending on the database you are using, the in might be quite inefficient and each output row is calculating the count(distinct) . Try rewriting the query as:

SELECT badge, count(usid)*100 / x.cnt
FROM Table1 t1 cross join
     (SELECT COUNT(DISTINCT usid) as cnt from Table1) x
WHERE exists (select 1
              from table2 t2
              where t2.usid = t1.usid and t2.msid = 1
             )
GROUP BY t1.badge, x.cnt;

This query will probably be faster, regardless of the database you are using.

By the way, it is suspicious that you are calculating count(usid) and then dividing by count(distinct usid) . I would expect either both or neither to be count(distinct) .

Answer 3

General rules of thumb on speeding up sql:

only return the minimum fields needed
use paging-- so you supply an offset and limit, and get a page of data
OR, cap returned data at some reasonable cutoff. (you will only see the first 500 results for a search, then the user needs to refine the search parameters) OTHERWISE, someone can run off an open-ended query and put extreme load on the system.
avoid IN statements
avoid nested queries
add indexes on joined/searched fields (in the order they are listed in the query)
Use numbers rather than strings if possible
Avoid joins if not needed (you may also denormalize the database)
If possible precompute information (like sums) and store those in another table or field. These can be updated on insert/update events of related data.

Answer 4

Can you try this:

declare @userIDcnt as int
select @userIDcnt = COUNT(DISTINCT usid) from Table1

SELECT badge, count(t1.usid)*100 / @userIDcnt
FROM Table1 t1
inner join Table2 t2 on t1.usid = t2.usid and t2.msid = 1
GROUP BY badge

Simple SQL query is taking 20 minutes to run?

Question

4 answers

solution1
0 ACCPTED 2014-08-21 14:55:04

solution2
0 2014-08-21 14:55:18

solution3
0 2014-08-21 14:57:46

solution4
0 2014-08-21 15:01:21

Simple SQL query is taking 20 minutes to run?

Question

4 answers

solution1 0 ACCPTED 2014-08-21 14:55:04

solution2 0 2014-08-21 14:55:18

solution3 0 2014-08-21 14:57:46

solution4 0 2014-08-21 15:01:21

solution1
0 ACCPTED 2014-08-21 14:55:04

solution2
0 2014-08-21 14:55:18

solution3
0 2014-08-21 14:57:46

solution4
0 2014-08-21 15:01:21