简体   繁体   中英

SQL Update with rolling self-join

I would like to add annual growth rates to a table of annual industry sales data created as follows (essential fields):

CREATE  TABLE IF NOT EXISTS MarketSizes (
  marketSizeID INT PRIMARY KEY AUTO_INCREMENT ,
  industry INT NOT NULL,
  year INT NOT NULL,
  countryID INT NOT NULL REFERENCES Countries (countryID),
  annualSales DEC(20,2) NULL,
  growthRate DEC(5,2) NULL) 

What is the most efficient way to populate/update the growthRate column, given annual data for some 25 years, 100+ countries, and 5000+ industries? Is the most effective way to index (industry, year, countryID)? Thanks for your time!

DISCLAIMER: This is untested and originated out of curiosity and some playing around. Please judge for yourself if you want to use it instead of going a "safer" route. Comments are welcome, if anyone wants to play around a bit more, here's a sqlfiddle I used. The rest was out of the head, but it's late at night, so please no downvotes for any mistakes.

Okay, out of curiosity I found a (hacky) way to speed up the update I think. I haven't tested it, apart from this little test:

    create table foo(id int, newid int);
    insert into foo (id) values (1), (2), (3);

    update foo, (select @prev:=0) vars
    set foo.newid = @prev,
    foo.id = if(@prev := id, id, id);

    select * from foo

    | ID | NEWID |
    --------------
    |  1 |     0 |
    |  2 |     1 |
    |  3 |     2 |

but I have made great experiences with select statements where you want to have information from a previous row. By using user variables one doesn't have to use self-joined tables (in a select). Since you can't update the table you're reading from at the same time a dummy table would be necessary. Just to mention some reasons why I developed this answer. So here it is:

Your update statement would be

SET @prev = 1; /*this is the value the row should have which has no previous year (or if countryID or industry changed)*/
SET @prevCountry = (SELECT countryID FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);
SET @prevIndustry = (SELECT industry FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);

/*also it's important to initialize the variable before-hand, not on the fly like in the example above. Otherwise MySQL complains about a syntax error, because it doesn't support an ORDER BY clause in a multi-table update statement. ORDER BY will be important in the statement!*/

UPDATE MarketSizes
SET growthRate = (annualSales - @prev) / @prev, /*here @prev holds the value of the previous row*/

/*and here come's your "where" clause. If country or industry change reset previousYear value to 1*/
marketSizeID = IF(@prevCountry != countryID OR @prevIndustry != industry, IF(@prev := 1, marketSizeID, marketSizeID), IF(@prev := 1, marketSizeID, marketSizeID)), /*why the convoluted IF()s? see explanation below, things got a bit messed up*/
marketSizeID = IF(@prev := annualSales, marketSizeID , marketSizeID), /*here the value of the current row gets assigned to @prev*/

/*Why the update on marketSizeID? And the IF(this,then,else)? That's the trick. Every other way to assign a new value to our variable @prev results in a syntax error. I just chose the primary key, because it's there. Actually it doesn't matter which column is used here and it might be another performance boost to choose a column which has no index on it (primary key has of course).*/

marketSizeID = IF(@prevCountry := countryID, marketSizeID, marketSizeID),
marketSizeID = IF(@prevIndustry := industry, marketSizeID, marketSizeID)

ORDER BY `year`, countryID, industry, marketSizeID;

Consider having the growthRate just in a VIEW:

CREATE VIEW growthRate AS
SELECT
m1.*,
(m1.annualSales - m2.annualSales) / m2.annualSales AS growthRate
FROM
MarketSizes m1
LEFT JOIN MarketSizes m2 ON m1.industry = m2.industry 
                         AND m1.countryID = m2.countryID 
                         AND m2.year = m1.year - 1

Create an index on (industry, countryID) and year and it should be performant enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM