简体   繁体   English

MySql 如何使用窗口函数加速工作查询

[英]MySql how to speed-up working query with window functions

EDIT: Actually, main problem is the last update setting A.last_52w_high_age_brutto_days (see below).编辑:实际上,主要问题是最后一次更新设置 A.last_52w_high_age_brutto_days(见下文)。 Is it possible to opzimize this update or to integrate somehow in CURSOR update?是否可以优化此更新或以某种方式集成到 CURSOR 更新中? Thank you.谢谢你。


I have a table with Stock Market daily prices.我有一张股票市场每日价格表。

Table Structure:表结构:

CREATE TABLE `cind_stocks_daily_rates` (
  `symbol` varchar(12) NOT NULL,
  `p_date` int unsigned NOT NULL,
  `open_price` decimal(9,4) NOT NULL,
  `high_price` decimal(9,4) NOT NULL,
  `low_price` decimal(9,4) NOT NULL,
  `close_price` decimal(9,4) NOT NULL,
  `price_52w_low` decimal(9,4) DEFAULT NULL,
  `price_52w_high` decimal(9,4) DEFAULT NULL,
  `last_52w_high_age_brutto_days` int unsigned DEFAULT NULL,
  PRIMARY KEY (`symbol`,`p_date `)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT Statement:插入语句:

INSERT INTO cind_stocks_daily_rates (symbol, datum, open_price, high_price, low_price, close_price, price_52w_low, price_52w_high, last_52w_high_age_brutto_days)
VALUES
('DELL', 20210616, 150, 152, 149, 151, 149, 152, 0),
('INTZ', 20210616, 250, 252, 249, 251, 249, 252, 0),
('MSFT', 20210616, 350, 352, 349, 351, 349, 352, 0),
('NTNX', 20210616, 452, 452, 449, 451, 449, 452, 0),
('DELL', 20210617, 148, 151, 147, 150, 147, 152, 1),
('INTZ', 20210617, 251, 254, 250, 252, 249, 254, 0),
('MSFT', 20210617, 346, 349, 345, 347, 345, 352, 1),
('NTNX', 20210617, 450, 454, 450, 453, 449, 454, 0),
('DELL', 20210618, 146, 147, 144, 145, NULL, NULL, NULL),
('INTZ', 20210618, 254, 256, 253, 255, NULL, NULL, NULL),
('MSFT', 20210618, 349, 351, 349, 350, NULL, NULL, NULL),
('NTNX', 20210618, 453, 456, 452, 454, NULL, NULL, NULL);

Desired result:想要的结果:

symbol  datum   open_price  high_price  low_price   close_price price_52w_low   price_52w_high  last_52w_high_age_brutto_days
DELL    20210616    150 152 149 151 149 152 0
INTZ    20210616    250 252 249 251 249 252 0
MSFT    20210616    350 352 349 351 349 352 0
NTNX    20210616    452 452 449 451 449 452 0
DELL    20210617    148 151 147 150 147 152 1
INTZ    20210617    251 254 250 252 249 254 0
MSFT    20210617    346 349 345 347 345 352 1
NTNX    20210617    450 454 450 453 449 454 0
DELL    20210618    146 147 144 145 144 152 2
INTZ    20210618    254 256 253 255 249 256 0
MSFT    20210618    349 351 349 350 345 352 2
NTNX    20210618    453 456 452 454 449 456 0

After I have filled the table with prices/rates for a given day I want to calculate the high & low rates for a last 52 Weeks– as well as duration in days between day with last 52 Weeks high price and a value of date column.在我用给定日期的价格/费率填充表格后,我想计算过去 52 周的高和低费率 - 以及过去 52 周高价和日期列值之间的天数持续时间。

I create a cursor:我创建一个游标:

SELECT distinct symbol FROM cind_stocks_daily_rates
WHERE  price_52w_low IS NULL OR price_52w_high IS NULL;

And loop the symbols through the Cursor (curr_symbol):并通过光标(curr_symbol)循环符号:

UPDATE cind_stocks_daily_rates AS A
CROSS JOIN
(SELECT AA.p_date, AA.symbol,
Min(AA.low_price) OVER (ORDER BY AA.p_date ROWS BETWEEN 260 PRECEDING AND CURRENT ROW) AS low_price_52w,
Max(AA.high_price) OVER (ORDER BY AA.p_date ROWS BETWEEN 260 PRECEDING AND 
CURRENT ROW) AS high_price_52w
FROM cind_stocks_daily_rates AA
WHERE AA.symbol = curr_symbol
ORDER BY AA.p_date) as B ON B.symbol = A.symbol AND B.p_date = A.p_date
SET A.price_52w_low = B.low_price_52w,
A.price_52w_high = B.high_price_52w,
WHERE  A.price_52w_low IS NULL OR A.price_52w_high IS NULL;

After cursor loop I have another update to determine how old is current 52 Weeks High price from today:在游标循环之后,我有另一个更新来确定从今天开始的当前 52 周高价的年龄:

EDIT: Actually, following update is my main problem (and only problem).编辑:实际上,以下更新是我的主要问题(也是唯一的问题)。 Duration approx.持续时间约+/- 10 minutes. +/- 10 分钟。 END_OF_EDIT END_OF_EDIT

UPDATE cind_stocks_daily_rates AS A
CROSS JOIN
 (SELECT AA.p_date, AA.symbol,
DATEDIFF(STR_TO_DATE(AA.p_date, "%Y %m %d"),STR_TO_DATE(
    (SELECT CCC.p_date
    FROM cind_stocks_daily_rates CCC
    WHERE CCC.symbol = AA.symbol AND CCC.p_date <= AA.p_date AND CCC.high_price = AA.price_52w_high ORDER BY CCC.p_date DESC LIMIT 1)
    , "%Y %m %d")) AS last_high_Date
    FROM cind_stocks_daily_rates AA) as B ON B.symbol = A.symbol AND B.p_date = A.p_date
        SET A.last_52w_high_age_brutto_days = B.last_high_Date
WHERE A.last_52w_high_age_brutto_days IS NULL;

It works everything as wanted without errors, just it takes too much time.它可以按需要运行而没有错误,只是需要太多时间。 Is it possible to speed it up?有没有可能加快速度? Is it possible to set the field “last_52w_high_age_brutto_days” together within cursor (without second update after cursor loop)?是否可以在游标内一起设置字段“last_52w_high_age_brutto_days”(游标循环后没有第二次更新)? Please any ideas to make the queries faster?请问有什么想法可以使查询更快吗?

Let's try to avoid doing everything every day.让我们尽量避免每天都做所有事情。

A big task is finding the 52-week highest price.一项重大任务是找到 52 周最高价。 Let's see if there is a shortcut.看看有没有捷径。

Maintain a separate table, one row per ticker, of the 52-week high and date of such.维护一个单独的表格,每个代码行,记录 52 周高点和日期。

If the date (for a given ticker) of the 52-week high is less than 52 weeks ago, then the price and date do not to be recomputed.如果 52 周高点的日期(对于给定的股票代码)小于 52 周前,则不会重新计算价格和日期。

Then it is easy and fast to discover which, if any, tickers need their highs recomputed.然后就可以轻松快速地发现哪些股票代码需要重新计算它们的高点(如果有的话)。 Update this table.更新此表。 Then use this table to populate the table you are asking about;然后使用此表填充您要询问的表; this UPDATE...JOIN... will be efficient.这个UPDATE...JOIN...将是有效的。

Some potential problems --一些潜在的问题——

  • Non-trading days: I don't think, this messes up the queries at all;非交易日:我不认为,这完全搞乱了查询; ignore it.忽略它。
  • Occasionally, most of the tickers will have their high exactly 52 weeks ago: This will be rare, and, alas, slow.有时,大多数股票代码会在 52 周前达到最高点:这种情况很少见,而且速度很慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM