I have two MySQL tables:
surveys (date, location, kilometers) primary key: date + location (one record per survey)
specimens (date, location, species) (zero or more records per survey date and location)
I want to find the count of surveys and the sum of kilometers surveyed where the specimens table contains no records for a particular species. In other words the number of surveys where a particular species was NOT found.
The total number of surveys is:
select count(date) as surveys, sum(kilometers) as KM_surveyed
from surveys;
+---------+-------------+
| surveys | KM_surveyed |
+---------+-------------+
| 20141 | 40673.59 |
+---------+-------------+
Finding the number of surveys where no specimens were found is easy:
select count(s.date) as surveys, sum(s.kilometers) as KM_surveyed
from surveys=s left join specimens=p
on (s.date=p.date and s.location=p.location)
where p.date is null;
+---------+-------------+
| surveys | KM_surveyed |
+---------+-------------+
| 8820 | 15848.26 |
+---------+-------------+
The total number of records in specimens is:
select count(*) from specimens;
+-----------+
| count(*) |
+-----------+
| 51566 |
+-----------+
The correct number of Harbor Seals (HASE) found on all surveys is:
select count(*) from specimens where species = 'HASE';
+-----------+
| count(*) |
+-----------+
| 662 |
+-----------+
Finding the number of surveys where Harbor Seals (HASE) were found is not easy.
Since the specimens table typically contains multiple records per survey, this query returns not the number of surveys but the number of HASE specimens found:
select count(s.date), sum(s.kilometers)
from surveys=s
left join specimens=p on (s.date=p.date and s.location=p.location)
where p.species = 'HASE';
+---------+-------------+
| surveys | KM_surveyed |
+---------+-------------+
| 662 | 2030.70 | WRONG! that is number of specimens not surveys
+---------+-------------+
Finding the number of surveys where no Harbor Seals (HASE) were found is not easy either. This query returns not the number of surveys but the number of specimens found that were not Harbor Seals:
select count(s.date), sum(s.kilometers)
from surveys=s
left join specimens=p on (s.date=p.date and s.location=p.location)
where p.species <> 'HASE' or p.date is null;`
+---------+-------------+
| surveys | KM_surveyed |
+---------+-------------+
| 50904 | 151310.49 |
+---------+-------------+
WRONG! 50904 = number of non HASE specimens
How do I construct queries to correctly count the number of surveys where Harbor Seals were found and the survey count when they were not found?
When you're doing a LEFT JOIN
to find non-matching rows, you should put the criteria that shouldn't be matched into the ON
clause, not the WHERE
clause.
SELECT COUNT(*), SUM(s.kilometers)
FROM surveys AS s
LEFT JOIN specimens AS p ON s.date = p.date and s.location = p.location
AND p.species = 'HASE'
WHERE p.date IS NULL
You can use an EXISTS
/ NOT EXISTS
subquery in the WHERE clause.
Surveys where HASE
is found in specimens
table:
select count(*), sum(s.kilometers)
from surveys s
where exists (
select *
from specimens p
where s.date=p.date
and s.location=p.location
and p.species = 'HASE'
)
Surveys where HASE
is not found in specimens
table:
select count(*), sum(s.kilometers)
from surveys s
where not exists (
select *
from specimens p
where s.date=p.date
and s.location=p.location
and p.species = 'HASE'
)
An alternative to the first query could be:
select count(*), sum(s.kilometers)
from (
select distinct date, location
from specimens
where species = 'HASE'
) p
join surveys s using (date, location)
Depending on the data (if 'HASE' is a rare "species") it might be faster.
The probably best alternative for the second query is already posted by Barmar.
Why do people find joins so hard?
Finding the number of surveys where Harbor Seals (HASE) were found:
select count(distinct concat(s.location, s.date))
from surveys s
Inner join specimens p
on (s.date=p.date and s.location=p.location)
where p.species = 'HASE';
Finding the number of surveys where no Harbor Seals (HASE) were NOT found is simply the difference between the number of surveys (which you already have) and the value from above. Since both queries return a single row a cartesian product of the queries will give a value in a single ouput row, but just to be a bit different:
Select count(*), sum(kilometres)
From (
Select kilometres
From surveys s
Left join specimens p
on (s.date=p.date and s.location=p.location)
and p.species = 'HASE'
Where p.species is null
) As zero_surveys;
(There are several other ways to write the query above)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.