Slow simple mysql query

Question

I have a problem with the speed of query. Simple mysql query, but when I have a lot of records (currently > 1 000 000), the performance is really slow. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using index; Using temporary; Using filesort. Has anyone any suggestions to speed it up? Slow query:

select 
    `books`.`id`
from `books`
join `books_data` on `books_data`.`book_id` = `books`.`id`
where 
    `books`.`is_status` = 'active'
order by `books_data`.`date_add` DESC
limit 0, 10

Result:

10 rows (0.525 s)

Explain:

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  books   NULL    ref PRIMARY,is_status,is_status_read_num,is_status_reviews_count,is_status_year_read_num,is_status_poster_name  is_status   1   const   112342  100.00  Using index; Using temporary; Using filesort
1   SIMPLE  books_data  NULL    ref book_id,book_id_date_add    book_id 4   mon.books.id    1   100.00  NULL

My tables:

CREATE TABLE `books` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `name_original` varchar(255) DEFAULT NULL,
  `annotation` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `year` int DEFAULT NULL,
  `year_original` int DEFAULT NULL,
  `year_publishing` int DEFAULT NULL,
  `poster` varchar(255) DEFAULT NULL,
  `isbn` varchar(255) DEFAULT NULL,
  `read_num` int NOT NULL DEFAULT '0',
  `reviews_count` int NOT NULL DEFAULT '0',
  `views_count` int NOT NULL DEFAULT '0',
  `rating` decimal(4,1) NOT NULL DEFAULT '0.0',
  `is_status` enum('new','active','duplicate','deleted') NOT NULL DEFAULT 'new',
  `deleted_reason` varchar(255) DEFAULT NULL,
  `duplicate_id` int DEFAULT NULL,
  `new_id` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `is_status` (`is_status`),
  KEY `read_num` (`read_num`),
  KEY `reviews_count` (`reviews_count`),
  KEY `views_count` (`views_count`),
  KEY `is_status_read_num` (`is_status`,`read_num`),
  KEY `is_status_reviews_count` (`is_status`,`reviews_count`),
  KEY `year_original` (`year_original`),
  KEY `year` (`year`),
  KEY `is_status_year_read_num` (`is_status`,`year`,`read_num`),
  KEY `year_publishing` (`year_publishing`),
  KEY `new_id` (`new_id`),
  KEY `is_status_poster_name` (`is_status`,`poster`,`name`),
  FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
CREATE TABLE `books_data` (
  `book_id` int NOT NULL,
  `date_add` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
  `file_id` int DEFAULT NULL,
  KEY `book_id` (`book_id`),
  KEY `file_id` (`file_id`),
  KEY `date_add` (`date_add`),
  KEY `book_id_date_add` (`book_id`,`date_add`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

Answer 1

The problem is this combination:

Filter on one table (see WHERE )
Sort by another table (see ORDER BY )
LIMIT

Formulation 1

The query will be executed this way:

A scan of the index BTree for bools.is_status to do the filtering. Note: The cardinality of status probably does not warrant using an index.
For each of those rows, find the matching row in books_data .
Sort the results to achieve ORDER BY...
Deliver the first LIMIT 10 rows.

Cost:

Partial scan of the index books.is_status
Reach into the index books_data.book_id that many times.
Reach into the table books_data that many times.
Sort
Deliver 10.

Formulation 2

You should Drop KEY book_id ( book_id ) since it is in the way of using the better ("covering") book_id_date_add . This eliminates the third bullet item.

Formulation 3

If you could eliminate the test on status (step 1), then INDEX(date_add) could be used, and the execution would be:

Using INDEX(date_add) , find the last 10 items.
Reach (10 times only) into books for books.id .

Cost:

10 rows from books_data.date_add
10 rows from books

The second bullet could be circumvented by delivering books_data.book_id instead of books.id .

However, the second bullet has a side effect that may not be needed -- It verifies that there is an entry in books for the desired books_data.book_id

Formulation 4

Consider this rewrite:

SELECT bd.book_id
    FROM books_book_data AS bd
    WHERE EXISTS ( SELECT 1 FROM books WHERE id = bd.book_id );

That eliminates the caveat mentioned in Formulation 3.

The Cost may be:

Scan at least 10 rows of INDEX(book_id, date_add)
For each, to a single probe to verify that status = 'active' . (10+ more rows)

This formulation 4

honors the apparent intent
will stop reasonably fast if not to many are not 'active'.

Notes

I have been assuming: 1 (books):: many (books_data)
If the tables are 1::1, combine them; all this discussion goes away.
Every table should have a PRIMARY KEY . If there is no 'natural' PK, use an AUTO_INCREMENT. If the combination AUTO_INCREMENT. If the combination (book_id, file_id)` is unique, then it would be a likely PK.
Whenever you have both INDEX(a) and INDEX(a,b) , drop the former; its existence can lead to a less optimal execution plan.

Slow simple mysql query

Question

1 answers

solution1
0 2021-12-14 18:43:29

Slow simple mysql query

Question

1 answers

solution1 0 2021-12-14 18:43:29

solution1
0 2021-12-14 18:43:29