简体   繁体   中英

SQL to count records that have no match in one-to-many relationship

I have two MySQL tables:

surveys (date, location, kilometers) primary key: date + location (one record per survey)

specimens (date, location, species) (zero or more records per survey date and location)

I want to find the count of surveys and the sum of kilometers surveyed where the specimens table contains no records for a particular species. In other words the number of surveys where a particular species was NOT found.

The total number of surveys is:

select count(date) as surveys, sum(kilometers) as KM_surveyed 
from surveys;

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |   20141 |    40673.59 |
 +---------+-------------+

Finding the number of surveys where no specimens were found is easy:

select count(s.date) as surveys, sum(s.kilometers) as KM_surveyed 
from surveys=s left join specimens=p 
on (s.date=p.date and s.location=p.location)
where p.date is null;

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |    8820 |    15848.26 |
 +---------+-------------+

The total number of records in specimens is:

select count(*) from specimens;

+-----------+
|  count(*) |
+-----------+
|     51566 |
+-----------+ 

The correct number of Harbor Seals (HASE) found on all surveys is:

select count(*) from specimens where species = 'HASE';

 +-----------+
 | count(*)  |
 +-----------+
 |       662 |
 +-----------+

Finding the number of surveys where Harbor Seals (HASE) were found is not easy.
Since the specimens table typically contains multiple records per survey, this query returns not the number of surveys but the number of HASE specimens found:

select count(s.date), sum(s.kilometers) 
from surveys=s 
left join specimens=p on (s.date=p.date and s.location=p.location) 
where p.species = 'HASE';

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |     662 |     2030.70 |  WRONG! that is number of specimens not surveys 
 +---------+-------------+

Finding the number of surveys where no Harbor Seals (HASE) were found is not easy either. This query returns not the number of surveys but the number of specimens found that were not Harbor Seals:

select count(s.date), sum(s.kilometers) 
from surveys=s 
left join specimens=p on (s.date=p.date and s.location=p.location) 
where p.species <> 'HASE' or p.date is null;`

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |   50904 |   151310.49 | 
 +---------+-------------+

WRONG! 50904 = number of non HASE specimens

How do I construct queries to correctly count the number of surveys where Harbor Seals were found and the survey count when they were not found?

When you're doing a LEFT JOIN to find non-matching rows, you should put the criteria that shouldn't be matched into the ON clause, not the WHERE clause.

SELECT COUNT(*), SUM(s.kilometers)
FROM surveys AS s
LEFT JOIN specimens AS p ON s.date = p.date and s.location = p.location
    AND p.species = 'HASE'
WHERE p.date IS NULL

You can use an EXISTS / NOT EXISTS subquery in the WHERE clause.

Surveys where HASE is found in specimens table:

select count(*), sum(s.kilometers)
from surveys s
where exists (
    select *
    from specimens p
    where s.date=p.date
      and s.location=p.location
      and p.species = 'HASE'
)

Surveys where HASE is not found in specimens table:

select count(*), sum(s.kilometers)
from surveys s
where not exists (
    select *
    from specimens p
    where s.date=p.date
      and s.location=p.location
      and p.species = 'HASE'
)

An alternative to the first query could be:

select count(*), sum(s.kilometers)
from (
    select distinct date, location
    from specimens
    where species = 'HASE'
) p
join surveys s using (date, location)

Depending on the data (if 'HASE' is a rare "species") it might be faster.

The probably best alternative for the second query is already posted by Barmar.

Why do people find joins so hard?

Finding the number of surveys where Harbor Seals (HASE) were found:

select count(distinct concat(s.location, s.date))
from surveys s 
Inner join specimens p 
on (s.date=p.date and s.location=p.location) 
where p.species = 'HASE';

Finding the number of surveys where no Harbor Seals (HASE) were NOT found is simply the difference between the number of surveys (which you already have) and the value from above. Since both queries return a single row a cartesian product of the queries will give a value in a single ouput row, but just to be a bit different:

Select count(*), sum(kilometres)
From (
  Select kilometres
  From surveys s
  Left join specimens p 
  on (s.date=p.date and s.location=p.location) 
  and p.species = 'HASE'
  Where p.species is null
) As zero_surveys;

(There are several other ways to write the query above)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM