Why doesn't COUNT(DISTINCT (*)) work?

Question

I am still surprised why such simple query is not working:

SELECT COUNT(DISTINCT *) FROM dbo.t_test

Where as

SELECT COUNT(DISTINCT col1) FROM dbo.t_test

and

SELECT DISTINCT * FROM dbo.t_test

works.

What is the alternative?

EDIT:

DISTINCT * checks for uniqueness for the combined key of (col1,col2,...) and returns those rows. I expected COUNT(DISTINCT *) to just return number of such rows. Am I missing anything here?

Answer 1

It doesn't work because you are only allowed to specify a single expression in COUNT(DISTINCT ...) as per the documentation :

COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

If you look carefully you can see that the allowed grammar doesn't include COUNT(DISTINCT *) .

The alternative is this:

SELECT COUNT(*) FROM
(
    SELECT DISTINCT * FROM dbo.t_test 
) T1

Answer 2

The truth of the matter is that SQL (Server) or any other SQL implementation is not supposed to do everything under the sun.

There are reasons to limit the SQL syntax to certain elements, from the parsing layer to query optimization to predictability of results to just common sense.

The COUNT aggregate function is normally implemented as a streaming aggregate with a gate for a single item, be it * (record count, just use a static token), or colname (increment token only when not null) or distinct colname (a hash/bucket with one key).

When you ask for COUNT(DISTINCT *) or for that matter, COUNT(DISTINCT a,b,c) - yes, it can surely be done for you if some RDBMS sees fit to implement it one day; but it is (1) uncommon enough (2) adds work to the parser (3) adds complexity to the COUNT implementation.

Mark has the correct alternative .

Answer 3

In addition to what the others have said:

One thing to be aware of is that doing a count(distinct *) (if it was allowed) on a table that has a primary key would be identical to a select count(*) .

This is because distinct * includes the PK column and therefor every row is distinct from every other row.

And as every non-trivial table should have a primary key (there are only very few exceptions to that rule) count(distinct *) can be "replaced" with count(*) anyway.

Answer 4

As a simple example, let's say you have two columns, A and B.

There are three distinct A values, but only one distinct B value. It would be impossible for COUNT(DISTINCT *) to return a single, meaningful value. That is why that syntax cannot work.

Answer 5

I had a same problem, finally make this solution

think you have something like this

PID	Name
1	milk
1	cheese
1	tea
2	butter
2	cream
3	honey

and your table was named "food" you will code like this

    select distinct count(dbo.food.PID) as count,a.PID from dbo.food
    inner join (select distinct dbo.food.PID as PID from db.food) a       
on dbo.food.PID=a.PID where a.PID=dbo.food.PID 
 group by a.PID,dbo.food.PID

this will show something like this

count	PID
3	1
2	2
1	1

Why doesn't COUNT(DISTINCT (*)) work?

Question

5 answers

solution1
21 2011-02-15 22:52:22

solution2
8 ACCPTED 2011-02-15 23:13:36

solution3
5 2011-02-15 23:04:03

solution4
4 2011-02-15 22:54:40

solution5
-1 2022-05-03 03:03:36

Why doesn't COUNT(DISTINCT (*)) work?

Question

5 answers

solution1 21 2011-02-15 22:52:22

solution2 8 ACCPTED 2011-02-15 23:13:36

solution3 5 2011-02-15 23:04:03

solution4 4 2011-02-15 22:54:40

solution5 -1 2022-05-03 03:03:36

solution1
21 2011-02-15 22:52:22

solution2
8 ACCPTED 2011-02-15 23:13:36

solution3
5 2011-02-15 23:04:03

solution4
4 2011-02-15 22:54:40

solution5
-1 2022-05-03 03:03:36