[英]SQL Update with rolling self-join
我想将年增长率添加到如下(必要字段)创建的年度行业销售数据表中:
CREATE TABLE IF NOT EXISTS MarketSizes (
marketSizeID INT PRIMARY KEY AUTO_INCREMENT ,
industry INT NOT NULL,
year INT NOT NULL,
countryID INT NOT NULL REFERENCES Countries (countryID),
annualSales DEC(20,2) NULL,
growthRate DEC(5,2) NULL)
在给出25年,100多个国家和5000多个行业的年度数据的情况下,填充/更新growthRate列的最有效方法是什么? 是最有效的索引方式(行业,年份,国家ID)? 谢谢你的时间!
免责声明:这是未经测试的,是出于好奇和一些游戏。 如果你想使用它而不是走“更安全”的路线,请自己判断。 欢迎评论,如果有人想玩更多,这里是我使用的sqlfiddle 。 其余的都不在头,但是已经很晚了,所以请不要为任何错误投票。
好吧,出于好奇,我发现了一种(hacky)加速更新的方法。 除了这个小测试之外,我还没有测试过它:
create table foo(id int, newid int);
insert into foo (id) values (1), (2), (3);
update foo, (select @prev:=0) vars
set foo.newid = @prev,
foo.id = if(@prev := id, id, id);
select * from foo
| ID | NEWID |
--------------
| 1 | 0 |
| 2 | 1 |
| 3 | 2 |
但是我已经在select语句中获得了很好的经验,你想要从前一行获取信息。 通过使用用户变量,不必使用自联接表(在选择中)。 由于您无法同时更新正在读取的表,因此需要使用虚拟表。 只是提一下为什么我提出这个答案的原因。 所以这里是:
你的更新声明是
SET @prev = 1; /*this is the value the row should have which has no previous year (or if countryID or industry changed)*/
SET @prevCountry = (SELECT countryID FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);
SET @prevIndustry = (SELECT industry FROM MarketSizes ORDER BY `year`, countryID, industry, marketSizeID LIMIT 1);
/*also it's important to initialize the variable before-hand, not on the fly like in the example above. Otherwise MySQL complains about a syntax error, because it doesn't support an ORDER BY clause in a multi-table update statement. ORDER BY will be important in the statement!*/
UPDATE MarketSizes
SET growthRate = (annualSales - @prev) / @prev, /*here @prev holds the value of the previous row*/
/*and here come's your "where" clause. If country or industry change reset previousYear value to 1*/
marketSizeID = IF(@prevCountry != countryID OR @prevIndustry != industry, IF(@prev := 1, marketSizeID, marketSizeID), IF(@prev := 1, marketSizeID, marketSizeID)), /*why the convoluted IF()s? see explanation below, things got a bit messed up*/
marketSizeID = IF(@prev := annualSales, marketSizeID , marketSizeID), /*here the value of the current row gets assigned to @prev*/
/*Why the update on marketSizeID? And the IF(this,then,else)? That's the trick. Every other way to assign a new value to our variable @prev results in a syntax error. I just chose the primary key, because it's there. Actually it doesn't matter which column is used here and it might be another performance boost to choose a column which has no index on it (primary key has of course).*/
marketSizeID = IF(@prevCountry := countryID, marketSizeID, marketSizeID),
marketSizeID = IF(@prevIndustry := industry, marketSizeID, marketSizeID)
ORDER BY `year`, countryID, industry, marketSizeID;
考虑将growRate放在VIEW中:
CREATE VIEW growthRate AS
SELECT
m1.*,
(m1.annualSales - m2.annualSales) / m2.annualSales AS growthRate
FROM
MarketSizes m1
LEFT JOIN MarketSizes m2 ON m1.industry = m2.industry
AND m1.countryID = m2.countryID
AND m2.year = m1.year - 1
在(行业,国家标准ID)和年份创建索引,它应该足够高性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.