简体   繁体   中英

How to optimise MySQL query containing a subquery?

I have two tables, House and Person . For any row in House, there can be 0, 1 or many corresponding rows in Person . But, of those people, a maximum of one will have a status of "ACTIVE", the others will all have a status of "CANCELLED".

eg

SELECT * FROM House LEFT JOIN Person ON House.ID = Person.HouseID

House.ID | Person.ID | Person.Status
       1 |         1 |     CANCELLED
       1 |         2 |     CANCELLED
       1 |         3 |        ACTIVE
       2 |         1 |        ACTIVE
       3 |      NULL |          NULL
       4 |         4 |     CANCELLED

I want to filter out the cancelled rows, and get something like this:

House.ID | Person.ID | Person.Status
       1 |         3 |        ACTIVE
       2 |         1 |        ACTIVE
       3 |      NULL |          NULL
       4 |      NULL |          NULL

I've achieved this with the following sub select:

SELECT *
FROM House
LEFT JOIN 
(
    SELECT *
    FROM Person
    WHERE Person.Status != "CANCELLED"
) Person
ON House.ID = Person.HouseID

...which works, but breaks all the indexes. Is there a better solution that doesn't?

I'm using MySQL and all relevant columns are indexed. EXPLAIN lists nothing in possible_keys .

Thanks.

How about:

SELECT *
FROM House
LEFT JOIN Person
ON House.ID = Person.HouseID 
AND Person.Status != "CANCELLED"

Do you have control of the database structure? If so, I think you could better represent your data by removing the column Status from the Person table and instead adding a column ActivePersonID to the House table. This way you remove all the redundant CANCELLED values from Person and eliminate application or stored procedure code to ensure only one person per household is active.

In addition, you could then represent your query as

 SELECT * FROM House LEFT JOIN Person ON House.ActivePersonID = Person.ID

Use:

   SELECT * 
     FROM HOUSE h 
LEFT JOIN PERSON p ON p.houseid = h.id
                  AND p.status = 'ACTIVE'

This is in SQL Server, but the logic seems to work, echoing Chris above:

declare @house table
(
    houseid int
)

declare @person table
(
    personid int,
    houseid int,
    personstatus varchar(20)
)

insert into @house (houseid) VALUES (1)
insert into @house (houseid) VALUES (2)
insert into @house (houseid) VALUES (3)
insert into @house (houseid) VALUES (4)

insert into @person (personid, houseid, personstatus) VALUES (1, 1, 'CANCELLED')
insert into @person (personid, houseid, personstatus) VALUES (2, 1, 'CANCELLED')
insert into @person (personid, houseid, personstatus) VALUES (3, 1, 'ACTIVE')
insert into @person (personid, houseid, personstatus) VALUES (1, 2, 'ACTIVE')
insert into @person (personid, houseid, personstatus) VALUES (4, 4, 'CANCELLED')

select * from @house
select * from @person

select *
from @house h LEFT OUTER JOIN @person p ON h.houseid = p.houseid 
    AND p.personstatus <> 'CANCELLED'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM