I can add more details if needed but basically coming into issues with querying large tables (100 million + rows). My queries are taking minutes to complete. Most of the data is previous data (ie last year sales data) that will not change. I have used this data in other reports I have made and been able to “roll up” the data on a nightly basis into a new table grouped by month, year, etc. However, wilh the report I am building, there are many dynamic elements, such as custom time/date pickers, that make it hard for me to do such roll up.
I guess my question is, does anyone have much experience with large tables and dynamic queries?
I have done my research as well from what I can and I have also made sure my database is well equipped. Currently 16gb ram and 12gb InnoDB buffer pool. (I am not an expert here, so let me know if there are other things to look for).
Thanks anyone for the assistance, and again, please let me know if you would like specific info on my use case.
SELECT mainaccounts.account_id AS 'ACCOUNTID',
(
SELECT name
FROM activitysettings
WHERE org_id = '5a1da86ed6ea7c6000e45e82'
AND id = '5a1da86ed6ea7c6000e45e8e' ) AS 'ACTIVITYNAME',
(
SELECT Count(DISTINCT a.id)
FROM activity a
WHERE a.org_id = '5a1da86ed6ea7c6000e45e82'
AND (
a.started_at BETWEEN '2018-01-01' AND '2018-02-01')
AND a.status = true
AND a.account_id = mainaccounts.account_id
GROUP BY a.account_id ) AS 'ACTIVITYTHIS',
(
SELECT Count(DISTINCT b.id)
FROM activity b
WHERE b.org_id = '5a1da86ed6ea7c6000e45e82'
AND (
b.started_at BETWEEN '2017-01-01' AND '2017-02-01')
AND b.status = true
AND b.account_id = mainaccounts.account_id
AND b.activity_id = '5a1da86ed6ea7c6000e45e8e'
GROUP BY b.account_id ) AS 'ACTIVITYLAST',
ifnull(
(
SELECT Sum(s1.volumece)
FROM sales s1
WHERE s1.org_id = '5a1da86ed6ea7c6000e45e82'
AND (
s1.invoice_date BETWEEN '2018-01-01'AND '2018-02-01'
AND s1.status = true
AND s1.account_id = mainaccounts.account_id group BY s1.account_id ),
0) AS 'SALESTHIS', ifnull(
(
SELECT sum(s2.volumece)
FROM sales s2
WHERE s2.org_id = '5a1da86ed6ea7c6000e45e82'
AND (
s2.invoice_date BETWEEN '2017-01-01' AND '2017-02-01'
AND s2.status = TRUE
AND s2.account_id = mainaccounts.account_id GROUP BY s2.account_id ),
0) AS 'SALESLAST', @podthis := ifnull(
(
SELECT sum(s1.units)
FROM sales s1
WHERE s1.org_id = '5a1da86ed6ea7c6000e45e82'
AND (
s1.invoice_date BETWEEN '2018-01-01'AND '2018-02-01'
AND s1.status = TRUE
AND s1.account_id = mainaccounts.account_id GROUP BY s1.account_id ),
0) AS 'UNITSTHIS', @podlast :=ifnull(
(
SELECT sum(s2.units)
FROM sales s2
WHERE s2.org_id = '5a1da86ed6ea7c6000e45e82'
AND (
s2.invoice_date BETWEEN '2017-01-01' AND '2017-02-01')
AND s2.status = TRUE
AND s2.account_id = mainaccounts.account_id
GROUP BY s2.account_id ),0) AS 'UNITSLAST',
CASE
WHEN (
@podthis IS NULL
OR @podthis <= 0) THEN 0
ELSE 1
end AS 'ISPODTHIS',
CASE
WHEN (
@podlast IS NULL
OR @podlast <= 0) THEN 0
ELSE 1
end AS 'ISPODLAST' FROM activity mainaccounts WHERE
mainaccounts.org_id = '5a1da86ed6ea7c6000e45e82'
AND mainaccounts.started_at BETWEEN '2018-12-01' AND
'2018-12-31'
AND mainaccounts.status = TRUE
AND mainaccounts.activity_id = '5a1da86ed6ea7c6000e45e8e'
GROUP BY account_id
I have quite a few indexes, so please ask if there is a specific one you think needed or would help.
The rollups should be to the day . That lets any date range work by rolling up the Rollup table.
As for other "dynamic" things, you need to have already build Summary tables that include the possible dynamic columns, and provide 'adequate' indexes on the Summary tables. And then add some smarts to the UI to pick the appropriate Summary table.
In my experience (several projects doing what you have described), it has always been reasonably easy to pick what columns need to be in Summary tables and even tailor the UI page(s) to direct users at the available choices. Once in a while, a new request comes in; then I whip out new code to summarize the raw data into a new Summary table (or augment an existing one), whip out a UI, and the job is done.
More discussion
Side issues...
What is index2
and what datatypes are involved? I worry about func
in the Explain.
Ranges
started_at BETWEEN '2017-01-01' AND '2017-02-01'
If the target is a DATE
, you have 32 days. If it is a DATETIME
, you have 31 days plus one second (an extra midnight). I recommend this pattern; it works for all date types, and avoids leap-year (etc) hassles:
started_at >= '2017-01-01'
AND started_at < '2017-01-01' + INTERVAL 1 MONTH
Does mainaccounts have
INDEX(org_id, status, activity_id, -- in any order
started_at) -- after the others
That is, have the =
s first, then the 'range'.
Activity needs
INDEX(org_id, status, account_id, activity_id, -- in any order, then
started_at)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.