简体   繁体   中英

Faster way to match a string in MySQL using replace

I have an interesting problem trying to select rows from a table where there are multiple possibilities for a VARCHAR column in my where clause.

Here's my table (which has around 7 million rows):

CREATE TABLE `search_upload_detailed_results` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `surId` bigint(20) DEFAULT NULL,
  `company` varchar(100) DEFAULT NULL,
  `country` varchar(45) DEFAULT NULL,
  `clei` varchar(100) DEFAULT NULL,
  `partNumber` varchar(100) DEFAULT NULL,
  `mfg` varchar(100) DEFAULT NULL,
  `cond` varchar(45) DEFAULT NULL,
  `price` float DEFAULT NULL,
  `qty` int(11) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `description` varchar(500) DEFAULT NULL,
  `status` varchar(45) DEFAULT NULL,
  `fileId` bigint(20) DEFAULT NULL,
  `nmId` bigint(20) DEFAULT NULL,
  `quoteRequested` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `sudr.surId` (`surId`),
  KEY `surd.clei` (`clei`),
  KEY `surd.pn` (`partNumber`),
  KEY `surd.fileId` (`fileId`),
  KEY `surd.price` (`price`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

I'm trying to match on the partNumber column. The problem is that the partNumber is in different formts, and can be entered in the search form in multiple formats.

Example: Part Number '300-1231-932' could be:

  • 300-1231-932
  • 3001231932
  • 300 1231 932

A simple select like this takes 0.0008 seconds.

select avg(price) as price from search_upload_detailed_results where 
partNumber LIKE '3001231932%' and price > 0;

But it doesn't give me all of the matches that I need. So I wrote this query.

select avg(price) as price from search_upload_detailed_results 
where REPLACE(REPLACE(partNumber,'-',''),' ','') LIKE REPLACE(REPLACE('3001231932%','-',''),' ','') and price > 0;

This gives me all of the correct matches, but it's super slow at 3.3 seconds.

I played around with some things, trying to reduce the number of rows I'm doing the replace on, and came up with this.

select avg(price) as price from search_upload_detailed_results 
where price > 0 AND 
partNumber LIKE('300%') AND 
REPLACE(REPLACE(partNumber,'-',''),' ','') LIKE REPLACE(REPLACE('3001231932%','-',''),' ','');

It takes 0.4 seconds to execute. Pretty fast, but could still be a bit time consuming in a multi-part search.

I would like to get it a little faster, but this is as far as I could get. Are there any other ways to optimize this query?

UPDATE to show explain for the 3rd query:

# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, search_upload_detailed_results, range, surd.pn,surd.price, surd.pn, 103, , 89670, Using where

The obvious solution is to just store the part number with no extra characters in the table. Then remove these characters from the user input, and just do a simple WHERE partnumber = @input query.

If that's not possible, you can add that as an additional column. In MySQL 5.7 you can use a generated column ; in earlier versions you can use a trigger that fills in this column.

I would like to get it a little faster, but this is as far as I could get. Are there any other ways to optimize this query?

As Barmar has said, the best solution if you really need speed (is 3.3s slow?) is to have a column with the untransformed data in it (hopefully now standardised), that'll allow you to query it without specifying all the different types of part numbers.

Example: Part Number '300-1231-932' could be:

300-1231-932 || 3001231932 || 300 1231 932

I think you should worry about the presentation of your data, having all those different 'formats' will make it difficult - can you format to one standard (before it reaches the DB)?

Here's my table (which has around 7 million rows):

Don't forget your index!

As mentioned elsewhere, the problem is the table format. If this is a non-negotiable then another alternative is:

If there are a few formats, but not too many, and they are well known (eg the three you've shown), then the query can be made to run faster by explicitly precalculating them all and searching for any of them.

select avg(price) as price from search_upload_detailed_results where 
partNumber IN ('300-1231-932', '3001231932', '300 1231 932')

This will take the best advantage of the index you presumably have on partNumber.

You may find that MySQL can make good use of the indexes for carefully selected regular expressions.

select avg(price) as price from search_upload_detailed_results where partNumber REGEXP '^300[- ]?1231[- ]?932';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM