简体   繁体   中英

MySQL complex string matching

I'm have a MySQL table where I store rows with a PartNumber field for inventory data from various companies. Companies have different ways of conveying the same PartNumber.

For example, say we have the PartNumber ROF-137-7516 . This same part may have the following iterations of that PartNumber:

ROF1377516
ROF1377516/R2
ROF 137 7516-2
ROF 137 7516/1
ROF 137 7516/1 R3D
ROF137 7516/2
ROF1377516/1
ROF-137-7516/2

I want a query that gets ALL of those parts when the user enters a search term of "ROF-137-7516". This is currently my query...

select * from parts where PartNumber like 'ROF-137-7516%';

But that only returns the last row. Is it possible to write a query that returns all of the parts?

If you want to handle this in SQL, here is one way with REPLACE() :

SELECT *
FROM Parts
WHERE REPLACE(REPLACE(PartNumber,'-',''),' ','') LIKE REPLACE('ROF-137-7516%','-','')

This assumes they will always enter the PartNumber with - or no spaces.

There's a few ways that you might want to do this, depending on the data that will be in your column, and what sort of performance you need to get out of the table. See the MySQL pattern matching page for more details.


1) Depending on what values you can expect in your PartNumber, you could replace the dashes with the % wildcard character, to match 0 or more of any character:

select * from parts where PartNumber like 'ROF%137%7516%'

But this may not be sufficient for you. For example, it would incorrectly return a row with this value: ROF 123 137XX/7516


2) If you always had some character between ROF, and the other digits, then you could use an _ in your search pattern.

select * from parts where PartNumber like 'ROF_137_7516%'

However, that match requires exactly one character between the values, so it would not match ROF1377516 , nor ROF - 137 7516 .


3.1) The most accurate way to run your query is by using a regular expression. However, regular expressions can impact your performance greatly; so use it sparingly. In your case, you make use of .* to match any character ( . ) zero or more times ( * ):

select * from parts where PartNumber regexp 'ROF.*137.*7516.*'

You may find that matching an "infinite" number of characters before 137 of 7516 is too much. For example, it would incorrectly match this: ROF 123 137XX/7516 . You may have noticed that this is exactly the same as #1 above.


3.2) If .* / % is too broad, then you can limit the number of characters that the . matches. Let's say it's standard to have one character between the numbers (space, dash, etc), but you want to make allowances for user error (such as no seperating characters, or typing two seperating characters instead of one). You can use {0,#} to limit how many characters to match. Let's say between 0 and 2 characters:

select * from parts where PartNumber regexp 'ROF.{0,2}137.{0,2}7516.*'

This way, it will match all of your example patterns in your question, but will not match ROF 123 137XX/7516 (because "123" and "xx/" are more than 2 characters)


4) Aaron Dietz answered with another technique, which is to use the replace() function. Depending on your table, this may be useful for you, but keep in mind that it will no longer be using the index. The index on the table is for the original values and datatypes of the columns, but running the value through replace() will mean that the index value cannot be used for the comparisons.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM