简体   繁体   中英

MYSQL Query Takes 4 Hours

Good afternoon all. I am coming to you in the hopes that you can provide some direction with a MYSQL optimization problem that I am having. First, a few system specifications.

  • MYSQL version: 5.2.47 CE
  • WampServer v 2.2

Computer:

  • Samsung QX410 (laptop)
  • Windows 7
  • Intel i5 (2.67 Ghz)
  • 4GB RAM

I have two tables:

  1. “Delta_Shares” contains stock trade data, and contains two columns of note. “Ticker” is Varchar(45), “Date_Filed” is Date. This table has about 3 million rows (all unique). I have an index on this table “DeltaSharesTickerDateFiled” on (Ticker, Date_Filed).

  2. “Stock_Data” contains two columns of note. “Ticker” is Varchar(45), “Value_Date” is Date. This table has about 19 million rows (all unique). I have an index on this table “StockDataIndex” on (Ticker, Value_Date).

I am attempting to update the “Delta_Shares” table by looking up information from the Stock_Data table. The following query takes more than 4 hours to run.

update delta_shares A, stock_data B
set A.price_at_file = B.stock_close
where A.ticker = B.ticker
    and A.date_filed = B.value_Date;

Is the excessive runtime the natural result of the large number of rows, poor index'ing, a bad machine, bad SQL writing, or all of the above? Please let me know if any additional information would be useful (I am not overly familiar with MYSQL, though this issue has moved me significantly down the path of optimization). I greatly appreciate any thoughts or suggestions.


UPDATED with "EXPLAIN SELECT"

1(id)  SIMPLE(seltype)  A(table)   ALL(type)  DeltaSharesTickerDateFiled(possible_keys) ... 3038011(rows)   

1(id)  SIMPLE(seltype)  B(table)  ref(type)  StockDataIndex(possible_keys)  StockDataIndex(key)  52(key_len) 13ffeb2013.A.ticker,13ffeb2013.A.date_filed(ref) 1(rows)   Using where

UPDATED with table describes. Stock_Data Table:

idstock_data    int(11)         NO  PRI     auto_increment
ticker          varchar(45)     YES MUL     
value_date      date            YES         
stock_close     decimal(10,2)   YES 

Delta_Shares Table:

iddelta_shares          int(11) NO  PRI     auto_increment
cik                     int(11) YES MUL     
ticker              varchar(45) YES MUL     
date_filed_identify     int(11) YES         
Price_At_File       decimal(10,2)   YES         
delta_shares        int(11) YES         
date_filed                date  YES         
marketcomparable            varchar(45)      YES            
market_comparable_price     decimal(10,2)    YES            
industrycomparable          varchar(45)      YES            
industry_comparable_price   decimal(10,2)    YES                    

Index from Delta_Shares:

delta_shares    0   PRIMARY 1   iddelta_shares  A   3095057             BTREE       
delta_shares    1   DeltaIndex  1   cik A   18          YES BTREE       
delta_shares    1   DeltaIndex  2   date_filed_identify A   20633           YES BTREE       
delta_shares    1   DeltaSharesAllIndex 1   cik A   18          YES BTREE       
delta_shares    1   DeltaSharesAllIndex 2   ticker  A   619011          YES BTREE       
delta_shares    1   DeltaSharesAllIndex 3   date_filed_identify A   3095057         YES BTREE       
delta_shares    1   DeltaSharesTickerDateFiled  1   ticker  A   11813           YES BTREE       
delta_shares    1   DeltaSharesTickerDateFiled  2   date_filed  A   3095057         YES BTREE       

Index from Stock_Data:

stock_data  0   PRIMARY 1   idstock_data    A   18683114                BTREE       
stock_data  1   StockDataIndex  1   ticker  A   14676           YES BTREE       
stock_data  1   StockDataIndex  2   value_date  A   18683114            YES BTREE       

There are a few benchmarks you could make to see where the bottleneck is. For example, try updating the field to a constant value and see how long it takes (obviously, you'll want to make a copy of the database to do this on). Then try a select query that doesn't update, but just selects the values to be updated and the values they will be updated to.

Benchmarks like these will usually tell you whether you're wasting your time trying to optimize or whether there is much room for improvement.

As for the memory, here's a rough idea of what you're looking at:

varchar fields are 2 bytes plus actual length and datetime fields are 8 bytes. So let's make an extremely liberal guess that your varchar fields in the Stock_Data table average around 42 bytes. With the datetime field that adds up to 50 bytes per row.

50 bytes x 20 million rows = .93 gigabytes

So if this process is the only thing going on in your machine then I don't see memory as being an issue since you can easily fit all the data from both tables that the query is working with in memory at one time. But if there are other things going on then it might be a factor.

Try analyse on both tables and use straight join instead of the implicit join. Just a guess, but it sounds like a confused optimiser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM