Good afternoon all. I am coming to you in the hopes that you can provide some direction with a MYSQL optimization problem that I am having. First, a few system specifications.
Computer:
I have two tables:
“Delta_Shares” contains stock trade data, and contains two columns of note. “Ticker” is Varchar(45), “Date_Filed” is Date. This table has about 3 million rows (all unique). I have an index on this table “DeltaSharesTickerDateFiled” on (Ticker, Date_Filed).
“Stock_Data” contains two columns of note. “Ticker” is Varchar(45), “Value_Date” is Date. This table has about 19 million rows (all unique). I have an index on this table “StockDataIndex” on (Ticker, Value_Date).
I am attempting to update the “Delta_Shares” table by looking up information from the Stock_Data table. The following query takes more than 4 hours to run.
update delta_shares A, stock_data B
set A.price_at_file = B.stock_close
where A.ticker = B.ticker
and A.date_filed = B.value_Date;
Is the excessive runtime the natural result of the large number of rows, poor index'ing, a bad machine, bad SQL writing, or all of the above? Please let me know if any additional information would be useful (I am not overly familiar with MYSQL, though this issue has moved me significantly down the path of optimization). I greatly appreciate any thoughts or suggestions.
UPDATED with "EXPLAIN SELECT"
1(id) SIMPLE(seltype) A(table) ALL(type) DeltaSharesTickerDateFiled(possible_keys) ... 3038011(rows)
1(id) SIMPLE(seltype) B(table) ref(type) StockDataIndex(possible_keys) StockDataIndex(key) 52(key_len) 13ffeb2013.A.ticker,13ffeb2013.A.date_filed(ref) 1(rows) Using where
UPDATED with table describes. Stock_Data Table:
idstock_data int(11) NO PRI auto_increment
ticker varchar(45) YES MUL
value_date date YES
stock_close decimal(10,2) YES
Delta_Shares Table:
iddelta_shares int(11) NO PRI auto_increment
cik int(11) YES MUL
ticker varchar(45) YES MUL
date_filed_identify int(11) YES
Price_At_File decimal(10,2) YES
delta_shares int(11) YES
date_filed date YES
marketcomparable varchar(45) YES
market_comparable_price decimal(10,2) YES
industrycomparable varchar(45) YES
industry_comparable_price decimal(10,2) YES
Index from Delta_Shares:
delta_shares 0 PRIMARY 1 iddelta_shares A 3095057 BTREE
delta_shares 1 DeltaIndex 1 cik A 18 YES BTREE
delta_shares 1 DeltaIndex 2 date_filed_identify A 20633 YES BTREE
delta_shares 1 DeltaSharesAllIndex 1 cik A 18 YES BTREE
delta_shares 1 DeltaSharesAllIndex 2 ticker A 619011 YES BTREE
delta_shares 1 DeltaSharesAllIndex 3 date_filed_identify A 3095057 YES BTREE
delta_shares 1 DeltaSharesTickerDateFiled 1 ticker A 11813 YES BTREE
delta_shares 1 DeltaSharesTickerDateFiled 2 date_filed A 3095057 YES BTREE
Index from Stock_Data:
stock_data 0 PRIMARY 1 idstock_data A 18683114 BTREE
stock_data 1 StockDataIndex 1 ticker A 14676 YES BTREE
stock_data 1 StockDataIndex 2 value_date A 18683114 YES BTREE
There are a few benchmarks you could make to see where the bottleneck is. For example, try updating the field to a constant value and see how long it takes (obviously, you'll want to make a copy of the database to do this on). Then try a select query that doesn't update, but just selects the values to be updated and the values they will be updated to.
Benchmarks like these will usually tell you whether you're wasting your time trying to optimize or whether there is much room for improvement.
As for the memory, here's a rough idea of what you're looking at:
varchar fields are 2 bytes plus actual length and datetime fields are 8 bytes. So let's make an extremely liberal guess that your varchar fields in the Stock_Data table average around 42 bytes. With the datetime field that adds up to 50 bytes per row.
50 bytes x 20 million rows = .93 gigabytes
So if this process is the only thing going on in your machine then I don't see memory as being an issue since you can easily fit all the data from both tables that the query is working with in memory at one time. But if there are other things going on then it might be a factor.
Try analyse
on both tables and use straight join
instead of the implicit join. Just a guess, but it sounds like a confused optimiser.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.