简体   繁体   中英

MySQL JSON_EXTRACT performance

We have a logging table which is growing as new events happening. At the moment we have around 120.000 rows of log events stored.

The events table looks like this:

'CREATE TABLE `EVENTS` (
`ID` int(11) NOT NULL AUTO_INCREMENT, 
`EVENT` varchar(255) NOT NULL, 
`ORIGIN` varchar(255) NOT NULL,
`TIME_STAMP` TIMESTAMP NOT NULL, 
`ADDITIONAL_REMARKS` json DEFAULT NULL, 
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=137007 DEFAULT CHARSET=utf8'

Additional_Remarks is a JSON field because different applications log into this table and can add more information to the event which happened. I did not want to put any data structure here, because this information can be different. For example one project management application can log:

ID, "new task created", "app", NOW(), {"project": {"id": 1}, "creator": {"id": 1}}

While other applications do not have projects or creator, but maybe cats and owners they want to store in the Additional_Remarks field.

Queries can use the Additional_Remarks field to filter information for one specific application like:

SELECT
DISTINCT(ADDITIONAL_REMARKS->"$.project.id") as 'project',
COUNT(CASE WHEN EVENT = 'new task created' THEN 1 END) AS 'new_task'
FROM EVENTS
WHERE DATE(TIMESTAMP) >= DATE(NOW()) - INTERVAL 30 DAY
AND ORIGIN = "app"
GROUP BY project
ORDER BY new_task DESC
LIMIT 10;

Output EXPLAIN query:

'1', 'SIMPLE', 'EVENTS', NULL, 'ALL', NULL, NULL, NULL, NULL, '136459', '100.00', 'Using where; Using temporary; Using filesort'

With this query I get the top 10 projects with the most created tasks for the last 30 days. Works fine, but this queries get slower and slower as our table grows. With 120.000 rows this query needs over 30 seconds.

Do you know any way to improve the speed? The newest information in the table with the highest id is more important then older entries. Often I look only for entries which happened in the last X days. It would be useful to stop the query after the first entry is older as X days from the where clause, as all further entries are even older.

if TIME_STAMP is indexed, the DATE function will not allow the index to be used because it is non-deterministic.

WHERE DATE(TIMESTAMP) >= DATE(NOW()) - INTERVAL 30 DAY

Can be rewritten as.

 WHERE TIMESTAMP >= UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL 30 DAY)

Do you know any way to improve the speed?

The only way i can see to speed up the query is to multicolumn index TIMESTAMP and ORIGIN like so ALTER TABLE EVENTS ADD KEY timestamp_origin (TIME_STAMP, ORIGIN); and mine query adjustment above

EDIT

And a delivered table may improve query speed because it will use the new index.

SELECT 
  ADDITIONAL_REMARKS->"$.project.id" AS 'project',
  COUNT(CASE WHEN EVENT = 'new task created' THEN 1 END) AS 'new_task'
FROM ( 

  SELECT 
   *
  FROM EVENTS 
  WHERE
    TIME_STAMP >= UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL 30 DAY)
  AND
    ORIGIN = "app"
) 
 AS events_within_30_days

GROUP BY project
ORDER BY new_task DESC
LIMIT 10;    

A inner select where I already reduce the amount of rows could reduce the query time from 30 sec to 0.05 sec.

It looks like:

SELECT 
ADDITIONAL_REMARKS->"$.project.id" AS 'project',  
COUNT(CASE WHEN EVENT = 'new task created' THEN 1 END) AS 'new_task'
FROM ( 

SELECT *   
   FROM EVENTS    WHERE
   EVENT = 'new task created'
   AND TIME_STAMP >= UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL 30 DAY)   
   AND ORIGIN = "app" )   AS events_within_30_days

GROUP BY project 
ORDER BY new_task DESC 
LIMIT 10;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM