简体   繁体   中英

Why doesn't COUNT(DISTINCT (*)) work?

I am still surprised why such simple query is not working:

SELECT COUNT(DISTINCT *) FROM dbo.t_test     

Where as

SELECT COUNT(DISTINCT col1) FROM dbo.t_test

and

SELECT DISTINCT * FROM dbo.t_test 

works.

What is the alternative?

EDIT:

DISTINCT * checks for uniqueness for the combined key of (col1,col2,...) and returns those rows. I expected COUNT(DISTINCT *) to just return number of such rows. Am I missing anything here?

It doesn't work because you are only allowed to specify a single expression in COUNT(DISTINCT ...) as per the documentation :

COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

If you look carefully you can see that the allowed grammar doesn't include COUNT(DISTINCT *) .

The alternative is this:

SELECT COUNT(*) FROM
(
    SELECT DISTINCT * FROM dbo.t_test 
) T1

The truth of the matter is that SQL (Server) or any other SQL implementation is not supposed to do everything under the sun.

There are reasons to limit the SQL syntax to certain elements, from the parsing layer to query optimization to predictability of results to just common sense.

The COUNT aggregate function is normally implemented as a streaming aggregate with a gate for a single item, be it * (record count, just use a static token), or colname (increment token only when not null) or distinct colname (a hash/bucket with one key).

When you ask for COUNT(DISTINCT *) or for that matter, COUNT(DISTINCT a,b,c) - yes, it can surely be done for you if some RDBMS sees fit to implement it one day; but it is (1) uncommon enough (2) adds work to the parser (3) adds complexity to the COUNT implementation.

Mark has the correct alternative .

In addition to what the others have said:

One thing to be aware of is that doing a count(distinct *) (if it was allowed) on a table that has a primary key would be identical to a select count(*) .

This is because distinct * includes the PK column and therefor every row is distinct from every other row.

And as every non-trivial table should have a primary key (there are only very few exceptions to that rule) count(distinct *) can be "replaced" with count(*) anyway.

As a simple example, let's say you have two columns, A and B.

A    B
1    100
2    100
3    100

There are three distinct A values, but only one distinct B value. It would be impossible for COUNT(DISTINCT *) to return a single, meaningful value. That is why that syntax cannot work.

I had a same problem, finally make this solution

think you have something like this

PID Name
1 milk
1 cheese
1 tea
2 butter
2 cream
3 honey

and your table was named "food" you will code like this

    select distinct count(dbo.food.PID) as count,a.PID from dbo.food
    inner join (select distinct dbo.food.PID as PID from db.food) a       
on dbo.food.PID=a.PID where a.PID=dbo.food.PID 
 group by a.PID,dbo.food.PID

this will show something like this

count PID
3 1
2 2
1 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM