简体   繁体   中英

Regex to match anything (including the empty string) except a specific given string

I'd like to test whether a string contains "Kansas" followed by anything other than " State" .

Examples:

"I am from Kansas"          true
"Kansas State is great"     false
"Kansas is a state"         true
"Kansas Kansas State"       true
"Kansas State vs Kansas"    true
"I'm from Kansas State"     false
"KansasState"               true

For PCRE , I believe the answer is this:

'Kansas(?! State)'

But Mysql's REGEXP doesn't seem to like that.

ADDENDUM: Thanks to David M for generalizing this question: How to convert a PCRE to a POSIX RE?

MySQL doesn't have lookaheads. A workaround is to make two tests:

WHERE yourcolumn LIKE '%Kansas%'
  AND yourcolumn NOT LIKE '%Kansas State%'

I used LIKE here instead of RLIKE because once you split it up like this, regular expressions are no longer required. However if you still need regular expressions for other reasons you can still use this same technique.

Note that this does not match 'Kansas Kansas State' as you requested.

Update: If matching 'Kansas Kansas State' is that important then you can use this ugly regular expression that is supported by MySQL:

'Kansas($|[^ ]| ($|[^S])| S($|[^t])| St($|[^a])| Sta($|[^t])| Stat($|[^e]))'

Oops: I just noticed Kip already updated his comment with a solution very similar to this.

This should work, assuming look-ahead assertions are allowed in MySQL regexes.

/Kansas(?! State)/

Edit : OK, this is super ugly, but it works for me in Perl and doesn't use a look-ahead assertion:

/Kansas(([^ ]|$)| (([^S]|$)|S(([^t]|$)|t(([^a]|$)|a(([^t]|$)|t([^e]|$))))))/

More efficient than that large regex (depending, of course, on your data and the quality of the engine) is

WHERE col LIKE '%Kansas%' AND
  (col NOT LIKE '%Kansas State%' OR
  REPLACE(col, 'Kansas State', '') LIKE '%Kansas%')

If Kansas usually appears in the form 'Kansas State', though, you may find this better:

WHERE col LIKE '%Kansas%' AND
  REPLACE(col, 'Kansas State', '') LIKE '%Kansas%'

This has the added advantage of being easier to maintain. It works less well if Kansas is common and text fields are large. Of course you can test these on your own data and tell us how they compare.

This is ugly, but here you go:

You might not need to expand the regex all the way to the end, depending on whether your input might include something like 'I need to get this man to surgery in Kansas Stat!'

mysql> select x,x RLIKE 'Kansas($|[^ ]| ($|[^S])| S($|[^t])| St($|[^a])| Sta($|[^t])| Stat($|[^e]))' AS result from examples;
+------------------------+--------+
| x                      | result |
+------------------------+--------+
| I am from Kansas       |      1 |
| Kansas State is great  |      0 |
| Kansas is a state      |      1 |
| Kansas Kansas State    |      1 |
| Kansas State vs Kansas |      1 |
| I'm from Kansas State  |      0 |
| KansasState            |      1 |
+------------------------+--------+
7 rows in set (0.00 sec)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM