简体   繁体   中英

MySQL SELECT return wrong results

I'm working with MySQL 5.7. I created a table with a virtual column (not stored) of type DATETIME with an index on it. While I was working on it, I noticed that order by was not returning all the data (some data I was expecting at the top was missing). Also the results from MAX and MIN were wrong. After I run

ANALYZE TABLE 
CHECK TABLE
OPTIMIZE TABLE

then the results were correct. I guess there was an issue with the index data, so I have few questions:

  1. When and why this could happen?
  2. Is there a way to prevent this?
  3. among the 3 command I run, which is the correct one to use?

I'm worried that this could happen in the future but I'll not notice.

EDIT :

as requested in the comments I added the table definition:

CREATE TABLE `items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) unsigned DEFAULT NULL,
  `image` json DEFAULT NULL,
  `status` json DEFAULT NULL,
  `status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL COMMENT 'used for index: it checks if status contains expired=true',
  `lifetime` tinyint(4) NOT NULL,
  `expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
  `last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`),
  KEY `expiration` (`status_expired`,`expiration`) USING BTREE,
  CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

Queries that were returning the wrong results:

SELECT * FROM items ORDER BY expiration DESC;
SELECT max(expiration),min(expiration) FROM items;

Thanks

TLDR;

The trouble is that your data comes from virtual columns materialized via indexes. The check, optimize, analyze operations you are doing forces the indexes to be synced and fixes any errors. That gives you the correct results henceforth. At least until the index gets out of sync again.

Why it may happen

Much of the problems are caused by issues with your table design. Let's start with.

`status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL

No doubt this is created to overcome the fact that you cannot directly index a JSON column in mysql. You have created a virtual column and indexed that instead. It's all very well, but this column can hold only one of two values; true or false . Which means it has very poor cadinality. As a result, mysql is unlikely to use this index for anything.

But we can see that you have combined the status_expired column with the expired column when creating the index. Perhaps with the idea of overcoming this poor cardinality mentioned above. But wait...

`expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,

Expiration is another virtual column. This has some repercussions.

When a secondary index is created on a generated virtual column, generated column values are materialized in the records of the index. If the index is a covering index (one that includes all the columns retrieved by a query), generated column values are retrieved from materialized values in the index structure instead of computed “on the fly”.

Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html#json-column-indirect-index

This is contrary to

VIRTUAL: Column values are not stored, but are evaluated when rows are read, immediately after any BEFORE triggers. A virtual column takes no storage.

Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html

We create virtual columns based on the sound principal that values generated by simple operations on columns shouldn't be stored to avoid redundancy, but by creating an index on it, we reintroduce redundancy.

Proposed fixes

based on the information provided, you don't really seem to need the status_expired column or even the expired column. An item that's past it's expiry date is expired!

CREATE TABLE `items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) unsigned DEFAULT NULL,
  `image` json DEFAULT NULL,
  `status` json DEFAULT NULL,
  `expire_date` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
  `last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`),
  KEY `expiration` (`expired_date`) USING BTREE,
  CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

Simply compare the current date with the expired_date column in the above table when you need to find out which items have expired. The difference here is instead of expired being a calculated item in every query, you calculate the expiry_date once, when you create the record.

This makes your table a lot neater and queries possibly faster

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM