简体   繁体   中英

MySQL Optimization For Large Tables

I can add more details if needed but basically coming into issues with querying large tables (100 million + rows). My queries are taking minutes to complete. Most of the data is previous data (ie last year sales data) that will not change. I have used this data in other reports I have made and been able to “roll up” the data on a nightly basis into a new table grouped by month, year, etc. However, wilh the report I am building, there are many dynamic elements, such as custom time/date pickers, that make it hard for me to do such roll up.

I guess my question is, does anyone have much experience with large tables and dynamic queries?

I have done my research as well from what I can and I have also made sure my database is well equipped. Currently 16gb ram and 12gb InnoDB buffer pool. (I am not an expert here, so let me know if there are other things to look for).

Thanks anyone for the assistance, and again, please let me know if you would like specific info on my use case.

SELECT   mainaccounts.account_id AS 'ACCOUNTID', 
     ( 
            SELECT name 
            FROM   activitysettings 
            WHERE  org_id = '5a1da86ed6ea7c6000e45e82' 
            AND    id = '5a1da86ed6ea7c6000e45e8e' ) AS 'ACTIVITYNAME', 
     ( 
              SELECT   Count(DISTINCT a.id) 
              FROM     activity a 
              WHERE    a.org_id = '5a1da86ed6ea7c6000e45e82' 
              AND      ( 
                                a.started_at BETWEEN '2018-01-01' AND      '2018-02-01') 
              AND      a.status = true 
              AND      a.account_id = mainaccounts.account_id 
              GROUP BY a.account_id ) AS 'ACTIVITYTHIS', 
     ( 
              SELECT   Count(DISTINCT b.id) 
              FROM     activity b 
              WHERE    b.org_id = '5a1da86ed6ea7c6000e45e82' 
              AND      ( 
                                b.started_at BETWEEN '2017-01-01' AND      '2017-02-01') 
              AND      b.status = true 
              AND      b.account_id = mainaccounts.account_id 
              AND      b.activity_id = '5a1da86ed6ea7c6000e45e8e' 
              GROUP BY b.account_id ) AS 'ACTIVITYLAST', 
     ifnull( 
     ( 
              SELECT   Sum(s1.volumece) 
              FROM     sales s1 
              WHERE    s1.org_id = '5a1da86ed6ea7c6000e45e82' 
              AND      ( 
                                s1.invoice_date BETWEEN '2018-01-01'AND      '2018-02-01' 
                       AND      s1.status = true 
                       AND      s1.account_id = mainaccounts.account_id group BY s1.account_id ),
                       0) AS 'SALESTHIS', ifnull( 
     ( 
              SELECT   sum(s2.volumece) 
              FROM     sales s2 
              WHERE    s2.org_id = '5a1da86ed6ea7c6000e45e82' 
              AND      ( 
                                s2.invoice_date BETWEEN '2017-01-01' AND      '2017-02-01' 
                       AND      s2.status = TRUE 
                       AND      s2.account_id = mainaccounts.account_id GROUP BY s2.account_id ),
                       0) AS 'SALESLAST', @podthis := ifnull( 
     ( 
              SELECT   sum(s1.units) 
              FROM     sales s1 
              WHERE    s1.org_id = '5a1da86ed6ea7c6000e45e82' 
              AND      ( 
                                s1.invoice_date BETWEEN '2018-01-01'AND      '2018-02-01' 
                       AND      s1.status = TRUE 
                       AND      s1.account_id = mainaccounts.account_id GROUP BY s1.account_id ),
                       0) AS 'UNITSTHIS', @podlast :=ifnull( 
     ( 
              SELECT   sum(s2.units) 
              FROM     sales s2 
              WHERE    s2.org_id = '5a1da86ed6ea7c6000e45e82' 
              AND      ( 
                                s2.invoice_date BETWEEN '2017-01-01' AND      '2017-02-01') 
              AND      s2.status = TRUE 
              AND      s2.account_id = mainaccounts.account_id 
              GROUP BY s2.account_id ),0) AS 'UNITSLAST', 
     CASE 
              WHEN ( 
                                @podthis IS NULL 
                       OR       @podthis <= 0) THEN 0 
              ELSE 1 
     end AS 'ISPODTHIS', 
     CASE 
              WHEN ( 
                                @podlast IS NULL 
                       OR       @podlast <= 0) THEN 0 
              ELSE 1 
     end AS 'ISPODLAST' FROM activity mainaccounts WHERE 
     mainaccounts.org_id = '5a1da86ed6ea7c6000e45e82'
     AND      mainaccounts.started_at BETWEEN '2018-12-01' AND      
     '2018-12-31' 
     AND      mainaccounts.status = TRUE 
     AND      mainaccounts.activity_id = '5a1da86ed6ea7c6000e45e8e' 
    GROUP BY account_id

I have quite a few indexes, so please ask if there is a specific one you think needed or would help.

解释

The rollups should be to the day . That lets any date range work by rolling up the Rollup table.

As for other "dynamic" things, you need to have already build Summary tables that include the possible dynamic columns, and provide 'adequate' indexes on the Summary tables. And then add some smarts to the UI to pick the appropriate Summary table.

In my experience (several projects doing what you have described), it has always been reasonably easy to pick what columns need to be in Summary tables and even tailor the UI page(s) to direct users at the available choices. Once in a while, a new request comes in; then I whip out new code to summarize the raw data into a new Summary table (or augment an existing one), whip out a UI, and the job is done.

More discussion

Side issues...

  • What is index2 and what datatypes are involved? I worry about func in the Explain.

  • Ranges

    started_at BETWEEN '2017-01-01' AND '2017-02-01'

If the target is a DATE , you have 32 days. If it is a DATETIME , you have 31 days plus one second (an extra midnight). I recommend this pattern; it works for all date types, and avoids leap-year (etc) hassles:

    started_at >= '2017-01-01'
AND started_at  < '2017-01-01' + INTERVAL 1 MONTH
  • Indexes

Does mainaccounts have

INDEX(org_id, status, activity_id,  -- in any order
      started_at)   -- after the others

That is, have the = s first, then the 'range'.

Activity needs

INDEX(org_id, status, account_id, activity_id,  -- in any order, then
      started_at)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM