简体   繁体   中英

How to get all data from a table only for the latest year, while many rows may be associated with that year

Here is the simplified table:

id - company_id - report_year - code

1  - 123456     - 2013        - ASD  
2  - 123456     - 2013        - SDF  
3  - 123456     - 2012        - ASD  
4  - 123456     - 2012        - SDF 

I would like to get all codes for the highest report_year available for the specified company_id.

So I should get:

1 - 123456 - 2013 - ASD  
2 - 123456 - 2013 - SDF

But I can not hard code WHERE year = 2013 , because for some company latest report year may be 2012 or 2009 for example. So I need to get data based on the latest year available.

So far I have query like this:

SELECT id, company_id, report_year, code,
FROM `my_table`
WHERE company_id= 123456

I have tried with some mixtures of group by and max() but I couldn't get what I need, this is the first time I am facing such a request, its confusing.

Any ideas ? I am using mysql.

Use a correlated sub-query to find latest year for a company:

SELECT id, company_id, report_year, code,
FROM `my_table` t1
WHERE company_id = 123456
  AND report_year = (select max(report_year)
                     from `my_table` t2
                     where t1.company_id = t2.company_id)

You could do this using a join on the same table which returns the max year per company like so:

select my_table.id, my_table.company_id, my_table.report_year, my_table.code
from my_table
inner join (
    select max(report_year) as maxYear, company_id
    from my_table
    group by company_id
) maxYear ON my_table.report_year = maxYear.maxYear
    and my_table.company_id = maxYear.company_id

To limit this to a specific company, just add your where clause back:

select my_table.id, my_table.company_id, my_table.report_year, my_table.code
from my_table
inner join (
    select max(report_year) as maxYear, company_id
    from my_table 
    where my_table.company_id= 123456
    group by company_id
) maxYear ON my_table.report_year = maxYear.maxYear
    and my_table.company_id = maxYear.company_id

Often, an anti-join yields better performance than using subqueries:

SELECT t1.id, t1.company_id, t1.report_year, t1.code
FROM `my_table` t1
LEFT JOIN `my_table` t2
ON t2.company_id = t1.company_id AND t2.report_year > t1.report_year
WHERE t1.company_id = 123456 AND t2.report_year IS NULL

For best performance, ensure you have a multi-column index on (company_id, report_year).

You can read more about this technique in the book SQL Antipatterns , which is where I learned it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM