简体   繁体   中英

Mysql: Selecting id, then * on id is much faster than selecting *. Why?

I have a MySQL database table (roughly 100K rows):

id BIGINT(indexed), external_barcode VARCHAR(indexed), other simple columns, and a LongText column.

The LongText column is a JSON data dump. I save the large JSON objects because I will need to extract more of the data in the future.

When I run this query it takes 29+ seconds:

SELECT * FROM scraper_data WHERE external_barcode = '032429257284'

EXPLAIN

#id  select_type table          partitions type  possible_keys key  key_len ref  rows     filtered Extra
'1' 'SIMPLE'     'scraper_data' NULL       'ALL' NULL          NULL NULL    NULL '119902' '0.00'   'Using where'

This more complex query takes 0.00 seconds:

SELECT * FROM scraper_data WHERE id = (
    SELECT id FROM scraper_data WHERE external_barcode = '032429257284'
)

EXPLAIN

# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'PRIMARY', 'scraper_data', NULL, 'const', 'PRIMARY,id_UNIQUE', 'PRIMARY', '8', 'const', '1', '100.00', NULL
'2', 'SUBQUERY', 'scraper_data', NULL, 'ALL', NULL, NULL, NULL, NULL, '119902', '0.00', 'Using where'

Less than 6 rows are returned from these queries. Why is the LONGTEXT slowing down the first query given that its not being referenced in the where clause?

CREATE TABLE

CREATE TABLE `scraper_data` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `bzic` varchar(10) NOT NULL,
  `pzic` varchar(10) DEFAULT NULL,
  `internal_barcode` varchar(20) DEFAULT NULL,
  `external_barcode_type` enum('upc','isbn','ean','gtin') DEFAULT NULL,
  `external_barcode` varchar(15) DEFAULT NULL,
  `url` varchar(255) NOT NULL,
  `title` varchar(255) DEFAULT NULL,
  `category` varchar(3) DEFAULT NULL,
  `description` text,
  `logo_image_url` varchar(255) DEFAULT NULL,
  `variant_image_urls` text,
  `parent_brand` varchar(10) DEFAULT NULL,
  `parent_brand_name` varchar(255) DEFAULT NULL,
  `manufacturer` varchar(10) DEFAULT NULL,
  `manufacturer_name` varchar(255) DEFAULT NULL,
  `manufacturer_part_number` varchar(255) DEFAULT NULL,
  `manufacturer_model_number` varchar(255) DEFAULT NULL,
  `contributors` text,
  `content_info` text,
  `content_rating` text,
  `release_date` timestamp NULL DEFAULT NULL,
  `reviews` int(11) DEFAULT NULL,
  `ratings` int(11) DEFAULT NULL,
  `internal_path` varchar(255) DEFAULT NULL,
  `price` int(11) DEFAULT NULL,
  `adult_product` tinyint(4) DEFAULT NULL,
  `height` varchar(255) DEFAULT NULL,
  `length` varchar(255) DEFAULT NULL,
  `width` varchar(255) DEFAULT NULL,
  `weight` varchar(255) DEFAULT NULL,
  `scraped` tinyint(4) NOT NULL DEFAULT '0',
  `scraped_timestamp` timestamp NULL DEFAULT NULL,
  `scrape_attempt_timestamp` timestamp NULL DEFAULT NULL,
  `processed` tinyint(4) NOT NULL DEFAULT '0',
  `processed_timestamp` timestamp NULL DEFAULT NULL,
  `modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `scrape_dump` longtext,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_UNIQUE` (`id`),
  UNIQUE KEY `url_UNIQUE` (`url`),
  UNIQUE KEY `internal_barcode_UNIQUE` (`internal_barcode`),
  KEY `bzic` (`bzic`),
  KEY `pzic` (`pzic`),
  KEY `internal_barcode` (`internal_barcode`),
  KEY `external_barcode` (`external_barcode`,`external_barcode_type`) /*!80000 INVISIBLE */,
  KEY `scrape_attempt` (`bzic`,`scraped`,`scrape_attempt_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=121674 DEFAULT CHARSET=latin1;

The second query could benefit from the cache that already contains the result of the first query.

In addition in the second subquery you just select use two column (id, external_barcode ) in these two column are in a index all the query result is obtained only with the index scan while in the first query for retrieve all the data the query must scan all the tables row ..

For avoiding the long time for the first query, you should add a proper index on external_barcode column

create index my_idx  on scraper_data (external_barcode, id)

Your queries are not equivalent, and your second query will throw an error if you have more than one row with that barcode:

Error Code: 1242. Subquery returns more than 1 row

This is probably what happens here: you do not actually get a result, just an error. Since MySQL can stop the full table scan as soon as it finds a second row, you can get this error faster than a correct result, including "0.00s" if those rows are among the first rows that are scanned (for example in id s 1 and 2).

From the execution plan, you can see that both do a full table scan (which, up to current versions, includes reading the blob field), and thus should perform similarly fast (as the first entry in your 2nd explain plan is neglectable for only a few rows).

So with a barcode that doesn't throw an error, both of your queries, as well as the corrected 2nd query (where you use IN instead of = ),

SELECT * FROM scraper_data WHERE id IN (   -- IN instead of = !!
    SELECT id FROM scraper_data WHERE external_barcode = '032429257284'
)   

as well as running your subquery

SELECT id FROM scraper_data WHERE external_barcode = '032429257284'

separately (which, if your assumption is correct, have to be even faster than your 2nd query) will have a similar (long) execution time.

As scaisEdge mentioned in his answer , an index on external_barcode will improve the performance significantly, as you do not not need to do a full table scan, as well as you do not need to read the blob field. You actually have such an index, but you disabled it ( invisible ). You can simply reenable it by using

ALTER TABLE scraper_data ALTER INDEX `external_barcode` VISIBLE;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM