简体   繁体   中英

Select rows from tableA based on age calculation from tableB

table1 we have ID, DOB(date of birth, eg. 01/01/1980) Table2 we have id and other columns

How to get all rows from table 2 if id is under the age of 20?

I currently have:

SELECT *
FROM table2
WHERE id IN (
    SELECT id
    FROM table1
    WHERE TIMESTAMPDIFF(Year,DOB,curdate()) <= 20
)

Is my solution efficient?

You would be better off calculating a date 20 years ago and asking if the table data is after that date. This means one calculation is needed, not a calculation for every row in the table. Any time that you perform a calculation on row data it means an index cannot be used. This is catastrophe for performance if DOB is indexed

TIMESTAMPDIFF doesn't count the number of years between two dates, it give you the number of times the year rolls over 31 dec for two dates. This means asking for the difference between 31 dec and 1 jan will report as 1 year when in fact it is only one (or upto two) days (depending on the times)

SELECT id  
FROM table1 
where DOB > DATE_SUB(CURDATE(), INTERVAL 20 YEAR)

Personally I use join rather than IN because once you learn the pattern it is easy to extend it using LEFT joins to look for rows that don't exist or match the patterns, but in practical terms the query optimizer rewrites IN and JOIN to execute them the same anyway. Some dB perform poorly for IN, because they execute them differently to joins

SELECT * 
FROM 
  table1 t1
  INNER JOIN table2 t2
  ON t1.id = t2.id
where t1.DOB > DATE_SUB(CURDATE(), INTERVAL 20 YEAR)

Mech is making the point about select * that it should be avoided in production code. That's a relevant point for the most part - always select only the columns you need (sometimes if a dB has indexed a table and you only need columns that are in the index, then using select * will be a performance hit because the dB has to use the index to look up which rows then lookup the rows. If you specify the columns you need it can decide whether it can answer the query purely from the index for a speed boost. The only time I might consider using select * is in a sub query where the optimizer will rewrite it anyway

Always alias your tables and use the aliases. This prevents your query breaking if later you add a column to either table that is the same name as a column in the other table. While adding things isn't usually a problem or cause bugs and crashes, if a query just "select name from a join b.." and only table a has a name column, it will start crashing if a name column is added to b. Specifying a.name would prevent this

For MySQL

SELECT table2.*
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE table1.dob >= CURRENT_DATE - INTERVAL 20 YEAR

Historically, MySQL has implemented EXISTS more efficiently than IN . So, I would recommend:

SELECT t2.*
FROM table2 t2
WHERE EXISTS (SELECT 1
              FROM table1 t1
              WHERE t1.id = t2.id AND
                    TIMESTAMPDIFF(Year, t1.DOB, curdate()) <= 20
             );

For performance, you want an index on table1(id, DOB) .

You can also change the year comparison to:

t1.DOB <= curdate() - interval 20 year

That is presumably the logic you want and the index could take advantage of it.

I recommend this over an join because there is no risk of having duplicate rows in the result set. Your question does not specify that id is unique in table1 , so duplicates are a risk. Even if there are no duplicates, this would also have the best performance under many circumstances.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM