I have an SQL table (in SQLite3) in which I am trying to aggregate information from several other tables, and records in one table may or may not have a corresponding record in another table. My query is supposed to include in the aggregate table both records with and without linked information. For example:
CREATE TABLE all_households AS
SELECT pop.uid AS pop_uid,
pop.surname,
pop.given,
pop.age,
pop.real_property,
farm.uid AS farm_uid,
farm.improved_acres,
farm.unimproved_acres,
farm.cash_value,
farm.corn,
farm.cotton
FROM pop, farm
WHERE pop.farm_id = farm.uid;
This is looking at data from census schedules. Everybody in the census will have the basic pop
information -- surname, given name, value of real property -- but not everybody has a farm. Only certain individuals have a value in the farm_id
column on pop
, corresponding to the record of that person's farm on farm
; otherwise farm_id
is NULL.
But naturally, the above query will fetch only those individuals for whom pop.farm_id = farm.uid
-- that is, who have farms, and have values for farm_id
. The farmless individuals are excluded, and I want to include them, with empty values for the relevant farm
columns in all_households
.
Now, I know I could solve this, and have so far, with separate SELECT statements for each linked column, like so:
CREATE TABLE all_households AS
SELECT uid AS pop_uid,
surname,
given,
age,
real_property,
(SELECT uid FROM farm WHERE pop.farm_id = farm.uid) AS farm_uid,
(SELECT improved_acres FROM farm WHERE pop.farm_id = farm.uid) AS improved_acres,
(SELECT unimproved_acres FROM farm WHERE pop.farm_id = farm.uid) AS unimproved_acres,
(SELECT cash_value FROM farm WHERE pop.farm_id = farm.uid) AS cash_value,
(SELECT corn FROM farm WHERE pop.farm_id = farm.uid) AS corn,
(SELECT cotton FROM farm WHERE pop.farm_id = farm.uid) AS cotton
FROM pop;
But this seems terribly clunky and inelegant. So, I wondered if there was a way to make the first query above pick up entries from pop
where farm_id
was NULL:
WHERE pop.farm_id = farm.uid OR pop.farm_id IS NULL;
But then things went very haywire, and I'm not sure why. In my real, unsimplified query, I'm actually dealing with four tables, each with a column on pop
that may be a value or may be NULL, and though the first query above as written took only seconds, the query with this WHERE hung. Forever. And when I came back, it had died with the error that "database or disk is full." So whatever I did, I seem to have elicited some kind of endless loop. I tried alternately:
WHERE (CASE WHEN pop.farm_id IS NOT NULL THEN pop.farm_id = farm.uid ELSE 1 END);
But this had the same result as before. Can anybody shed any light on what I'm doing wrong, or what I might do better? Thanks.
Your attempt to use farm_id IS NULL
was slow because the database attempted to give you the combination of each farm
record with each pop
record with the NULL
value. Furthermore, optimizing constraints with OR is not easy and was done with a temporary table.
To get all matched/joined records, and all records from the first table with no corresponding farm, combine two queries with UNION ALL :
SELECT pop. ..., farm. ...
FROM pop JOIN farm ON pop.farm_id = farm.uid
UNION ALL
SELECT pop. ..., NULL, NULL, ...
FROM pop
WHERE pop.farm_id IS NULL
This construct is called an outer join and is supported directly in most SQL databases (SQLite supports only left joins, which is what you want here):
SELECT pop. ..., farm. ...
FROM pop LEFT OUTER JOIN farm ON pop.farm_id = farm.uid
Please note that an outer join actually returns all unmatched records, so this will also return pop
records with an invalid farm_id
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.