简体   繁体   中英

SQL count(distinct) from both the table

I have 2 tables. Let's say Table A and Table B. Table A has a column called "name". Table B also has a column "name". I want to find out the count(distinct name). Name should take values from both the columns.

For ex-

Table A

name             
A                 
B                 
C              

Table B

name   
A   
B   
D

Output should be 4.

The best concept is, first combine the data in the way you want using a subquery, and then dedupe or do the 2nd step.

For example,

WITH COMBINED AS (
  SELECT 
    name 
  FROM 
    TableA 

  UNION ALL 

  SELECT 
    name 
  FROM 
    TableB
) 
SELECT 
  DISTINCT name 
FROM 
  COMBINED

In your situation, the 2nd step can be accomplished by changing UNION ALL to a UNION . This will dedupe the values automatically. You won't even need a subquery or a 2nd step. But I wanted to teach you the concept because it comes up often.

SELECT name FROM TableA

UNION

SELECT name FROM TableB

Then UNION in the CTE will reove all Duplicates

so a COUNT(*) will suffoce

WITH CTE AS (
    SELECT name FROM TableA
       UNION
     SELECT name FROM TableB
)
SELECT COUNT(*) FROM CTE

I hope this query should do it:

SELECT SUM(names) AS total_names FROM ( SELECT COUNT(DISTINCT(name)) as names FROM TableA UNION SELECT COUNT(DISTINCT(name)) as names FROM TableB ) t;

Note: Tested with sql server

Yet another option:

select hll_count.merge(hll_sketch) names 
from (
  select hll_count.init(name) hll_sketch from tableA
  union all
  select hll_count.init(name) from tableB
)            

HLL++ functions are approximate aggregate functions. Approximate aggregation typically requires less memory than exact aggregation functions, like COUNT(DISTINCT), but also introduces statistical error. This makes HLL++ functions appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.

See more about benefits of using HyperLogLog++ functions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM