I have been playing with the JSON support in MySQL 5.7. I have a few questions about the generated columns for the purpose of indexing. https://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-secondary-indexes-virtual-columns .
Specifically, refer to this line:
JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column.
This seems to be a big limitation for me. Everywhere I look, people suggest using generated columns. But that workaround would work for a very limited set of use-cases. Or, I am understanding something wrong.
Let me explain my use-case. Suppose you have a table called standards
. It has the following structure:
CREATE TABLE `standards` (
`id` int(11) NOT NULL,
`name` varchar(100) NOT NULL,
`sections` json DEFAULT NULL,
`subjects` json DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
The sections
column contains an array of JS objects:
[
{
"id": 90491,
"name": "A",
},
{
"id": 90494,
"name": "B",
}
]
The subjects
column contains a nested JS object:
{
"576845": {
"id": 576845,
"name": "Computer Education"
},
"576848": {
"id": 576848,
"name": "English Language"
},
"576854": {
"id": 576854,
"name": "Environmental Science"
},
"576860": {
"id": 576860,
"name": "Mathematics"
}
}
To find a Standard
record which has a section ID
of 90494
, the query would be:
SELECT * from standards WHERE JSON_CONTAINS( sections->>'$[*].id', '90494' );
To find a Standard
record which has the subject ID
of 576854
, the query would be:
SELECT * from standards WHERE JSON_CONTAINS_PATH( subjects, 'one', '$."576854"');
OR
SELECT * from standards WHERE JSON_CONTAINS( subjects->>'$.*.id', '576854' );
Now, all the above works. The problem is that the queries perform a full table scan.
Considering Query 1 from above, how can I generate a virtual column with scalar data which contains ALL section IDs
?
Each Standard
record has multiple sections
, with multiple IDs. So, I can't just create an integer virtual column to store a single value. It has to be an array of section IDs, through which we need to search.
So, my generated column would be like below:
ALTER TABLE standards
ADD section_ids json GENERATED ALWAYS AS (sections->>'$[*].id') VIRTUAL NOT NULL;
The generated column will now store just the array of section IDs. But I cannot add an index on the generated column, because it is again a JSON column.
So, the question comes down to this - for my queries shown above, how do I avoid full table scans?
Any suggestions would be appreciated.
I won't say it isn't possible with MySQL 5.7 - because it is, with clunky workarounds and limitations - but I will not go into a how-to with that version as it is much more difficult and the limitations will, in many cases, be reached if a large number of items can be added to the array.
However, it is possible as of MySQL 8.0.17 which now supports multi-valued indexes .
ALTER TABLE standards
ADD INDEX section_ids ( (CAST(sections->'$[*].id' AS UNSIGNED ARRAY)) ),
ADD INDEX subject_ids ( (CAST(subjects->'$.*.id'AS UNSIGNED ARRAY)) );
** Note that $.*
will take all object properties and return the queried values ( .id
) of each formatted as an array.
EXPLAIN SELECT * from standards WHERE JSON_CONTAINS( sections->'$[*].id', '90494' );
EXPLAIN SELECT * from standards WHERE JSON_CONTAINS( subjects->'$.*.id', 576854 );
You will see that the indexes are used for those queries.
I would solve this in older versions by manually creating a separate index table, and using triggers to keep it up to date.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.