简体   繁体   中英

How to handle search variant on MySQL query?

I have phone number list on our PhoneNos table

ID | PhoneNo
1 | +61 2 9666 8000 

We try to search this phone no into our Content table (ie. desc field)

The challenge is actualy:

The desc field is a text and the input can be any thing such as:

ContentID | Desc    
1 | bla bla ... +61 (02) 9666 8000 ... bla bla
2 | bla bla ... +61-2-9666-8000 bla bla
3 | bla bla ... +61 2 96668000 bla bla
4 | bla bla ... +61296668000 00116129668000 bla bla

or could be anything arranging from extra spacing such as

5 | bla bla ... +61  (02) 9666   8000 ... bla bla
6 | bla bla ... +61-2 9662 0382 ... bla bla

That's an Australia phone number BUT again it could be USA or any other countries SO it's not tight with 1 particular country.

This phone no have no pattern what so ever before and after this phone no. So it could be anything.

Is there anyway to handle this sort of thing easily? I can probably construct each condition above BUT I'm just wondering if there is a better solution.

Just normalize the users input to a format that is easy to search ie "+ [ x ]". If the user enters additional spaces remove them. Add country code if necessary. Remove 00 from start and replace with +. You could even split the 'phone number into three columns to make searching easier.

Why not just remove special symbols from the phone numbers and store them as just number strings?

The only case you need to consider is the +, because it replaces 00.

So basically, your records will have just numbers, your input will have just numbers. Just make sure you normalize the + to something, both in your database and the input.

What I would do is store them all with 00 instead of a +, so that when a search input with 00 comes through, it will work, as well as a search with a +. Hope this makes sense.

My (highly uneducated) thought would be to use a regular expression replace (see here ). Essentially strip everything in the content except for numbers and plus signs (feeling clunky yet? :) ), and then compare to your control string with the same processing ( \\\\+\\d+ , basically). That makes the rather broad assumption that there will be no false positives created by another random string of numbers/characters matching your number (I imagine somewhat unlikely from a probability perspective, but always a possibility).

I was tinkering around with what I'm sure is highly inefficient, inelegant and likely incorrect solution, and realized that it won't handle the case with a leading 0 inside parentheses (since this doesn't seem to be present in other patterns). You can find it here if you're curious, but I think the regex solution may be the most efficient way to handle.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM