简体   繁体   中英

How to select records from mysql database by regex

I have a regexp to validate user email address.

/^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})$/i"

With the help of active record, I want to fetch from a database all the users whose email address doesn't match this regexp. I tried the following scope to achieve the desired result, but all I get is ActiveRecord::Relation .

scope :not_match_email_regex, :conditions => ["NOT email REGEXP ?'", /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})$/"]

This gives me the following query:

SELECT `users`.* FROM `users` WHERE (email REGEXP '--- !ruby/regexp /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\\-+)|([A-Za-z0-9]+\\.+)|([A-Za-z0-9]+\\++))*[A-Za-z0-9]+@((\\w+\\-+)|(\\w+\\.))*\\w{1,63}\\.[a-zA-Z]{2,})$/\n...\n')

I also tried to define this scope in the following way with the same result:

scope :not_match_email_regex, :conditions => ["email REGEXP '(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})'"]

The query it generates is:

SELECT `users`.* FROM `users` WHERE (email REGEXP '(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+.+)|([A-Za-z0-9]+++))*[A-Za-z0-9]+@((w+-+)|(w+.))*w{1,63}.[a-zA-Z]{2,})')

How can I fetch all records that match or don't match the given regex?

EDIT 12-11-30 small corrections partly according to the comment by @innocent_rifle

The suggested Regexp here is trying to make the same matches as in the original question

1. In my solution when I first wrote it I forgot that you must escape \\ in strings because I was testing directly in MySQL. When discussing Regexps it's confusing to use Regexps in strings, so I will use this form instead eg /dot\\./.source which (in Ruby) will give "dot\\\\." .

2. REGEXP in MySQL (manual for 5.6, tested in 5.0.67) are using "C escape syntax in strings", so WHERE email REGEXP '\\.' is still the same as WHERE email REGEXP '.' , to find the character "." you must use WHERE email REGEXP '\\\\.' , to achieve that you must use the code .where([ 'email REGEXP ?', "\\\\\\\\."]) . It's more readable to use .where([ 'email REGEXP ?', /\\\\./.source ]) (MySQL needs 2 escapes). However, I prefer to use .where([ 'email REGEXP ?', /[.]/.source ]) , then I don't have to worry about how many escapes you need.

3. You don't need to escape "-" in a Regexp, not when using that in [] either as long as that character is the first or the last.


Some errors I found: it's the first regexp-or "|" in you expression, and it should be as a String in the query, or using Regexp#source which I prefer. There was also an extra quote at the end I think. Except from that are you really sure the regexps works. If you try it in the console on a string?

Also be aware of that you won't catch emails with NULL in db, in that case you must add (<your existing expr in parentheses>) OR IS NULL

Regexp syntax in my MySQL verion.

I also tested what @Olaf Dietsche wrote in his suggestion, it seems that it's not needed, but it's strongly recommended to follow the standard syntax anyway ( NOT (expr REGEXP pat) or expr NOT REGEXP pat ).

I have done some checking, these things must be changed: use [A-Za-z0-9_] instead of \\w , and \\+ is not valid, you must use \\\\+ ( "\\\\\\\\+" if string), easier with [+] (in both Regexp or string).

It leads to following REGEXP in MySQL

'^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]+)|([A-Za-z0-9]+[+]+))*[A-Za-z0-9]+@(([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]))*[A-Za-z0-9]{1,63}[.][a-zA-Z]{2,}$'

Small change suggestions

I don't understand your regexp exactly, so this is only changing your regexp without changing what it will find.

First: change the whole string as I described above

Then change

(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]+)|([A-Za-z0-9]+[+]+))*

to

([A-Za-z0-9]+[-+_.]+)*

and

@(([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]))*

to

@([A-Za-z]+[-.]+)*

Final code (change to ..., :conditions => ... syntax if you prefer that). I tried to make this find the same strings as in the comment by @innocent_rifle, only adding "_" in expressions to the right of @

.where([ 'NOT (email REGEXP ?)', /^([A-Za-z0-9]+[-+_.]+)*[A-Za-z0-9]+@([A-Za-z0-9]+[-._]+)*[A-Za-z0-9_]{1,63}[.][A-Za-z]{2,}$/.source ])

For validating email addresses, you might want to consider How to Find or Validate an Email Address . At least, this regexp looks a bit simpler.

According to MySQL - Regular Expressions the proper syntax is

expr REGEXP pat

for a match, and

expr NOT REGEXP pat or NOT (expr REGEXP pat)

for the opposite. Don't forget the braces in the second version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM