简体   繁体   中英

MySQL Analytics Query - Improve Performance

I have a mysql table that holds about 8 Million Records and I need to run some analytics on it to get averages as shown in below table definition and query. The result contains hourly analytics (avg of a parameter value) for the last 1 year data.

MySQL Server Version : 8.0.15

Table:

create table `temp_data` (
  `dateLogged`   datetime NOT NULL,
  `paramName`    varchar(30) NOT NULL,
  `paramValue`   float NOT NULL,
  `sensorId`     varchar(20) NOT NULL, 
  `locationCode` varchar(30) NOT NULL,
  PRIMARY KEY (`sensorId`,`paramName`,`dateLogged`),
  KEY `summary` (`locationCode`,`paramName`,`dateLogged`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED

Query: The below query transposes row based parameters into columns and while doing so computes the average of param values

SELECT  dateLogged,
        ROUND(avg( ROUND(IF(paramName = 'temp1', paramValue, NULL),2) ),2) AS T1,
        ROUND(avg( ROUND(IF(paramName = 'temp2', paramValue, NULL),2) ),2) AS T2,
        ROUND(avg( ROUND(IF(paramName = 'temp3', paramValue, NULL),2) ),2) AS T3,
        ROUND(avg( ROUND(IF(paramName = 'temp4', paramValue, NULL),2) ),2) as T4
FROM temp_data where locationCode='A123' and paramName in ('temp1','temp2','temp3','temp4')
group by dateLogged order by dateLogged;

Result:

+---------------------+--------+---------+-------+-------+
| date                | T1     | T2      | T3    | T4    |
+---------------------+--------+---------+-------+-------+
| 2018-12-01 00:00:00 |  95.46 |   99.12 | 96.44 | 95.86 |
| 2018-12-01 01:00:00 | 100.38 |  101.09 | 99.56 | 99.70 |
| 2018-12-01 02:00:00 | 101.41 |  102.08 | 99.47 | 99.88 |
| 2018-12-01 03:00:00 |  98.79 |  100.47 | 98.59 | 99.75 |
| 2018-12-01 04:00:00 |  98.23 |  100.58 | 98.38 | 98.93 |
| 2018-12-01 05:00:00 | 101.03 |  101.80 | 99.37 | 99.88 |
   ...                   ...       ...      ...      ...
+---------------------+--------+---------+---------+-----+

Problem:

Now there are over 8 Million records in the table and the query takes approximately 35 to 40 seconds to execute.

Looking for suggestions on how to improve the query performance and hopefully, bring it down to under 10 seconds.

Note:

The table has data for up to 1 year and data beyond that is archived and deleted

Result of describe:

+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| id | select_type | table     | partitions | type | possible_keys   | key     | key_len | ref   | rows    | filtered | Extra                                                  |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
|  1 | SIMPLE      | temp_data | NULL       | ref  | PRIMARY,summary | summary | 53      | const | 3524800 |    50.00 | Using index condition; Using temporary; Using filesort |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+

As temp1 -> temp4 are fixed we can use generated columns to index this:

alter table temp_data add p1234 bool as (paramName IN ('temp1','temp2','temp3','temp4')) NOT NULL,
ADD KEY s1234 (locationCode, p1234, paramName, paramValue, dateLogged)

Then change the query too:

SELECT  dateLogged, paramName,
        ROUND(avg( ROUND(paramValue,2) ),2)
FROM temp_data where locationCode='A123' and p1234
group by dateLogged, paramName
order by dateLogged, paramName;

Handle the T1 -> T4 paramName formatting in the application code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM