正则表达式匹配除特定给定字符串之外的任何内容（包括空字符串）

Question

I'd like to test whether a string contains "Kansas" followed by anything other than " State" . 我想测试一个字符串是否包含"Kansas"其后是" State"以外的任何内容。

Examples: 例子：

"I am from Kansas"          true
"Kansas State is great"     false
"Kansas is a state"         true
"Kansas Kansas State"       true
"Kansas State vs Kansas"    true
"I'm from Kansas State"     false
"KansasState"               true

For PCRE , I believe the answer is this: 对于PCRE ，我相信答案是这样的：

'Kansas(?! State)'

But Mysql's REGEXP doesn't seem to like that. 但Mysql的REGEXP似乎并不喜欢这样。

ADDENDUM: Thanks to David M for generalizing this question: How to convert a PCRE to a POSIX RE? 附录：感谢David M推广这个问题：如何将PCRE转换为POSIX RE？

Answer 1

MySQL doesn't have lookaheads. MySQL没有前瞻性。 A workaround is to make two tests: 解决方法是进行两项测试：

WHERE yourcolumn LIKE '%Kansas%'
  AND yourcolumn NOT LIKE '%Kansas State%'

I used LIKE here instead of RLIKE because once you split it up like this, regular expressions are no longer required. 我在这里使用了LIKE而不是RLIKE因为一旦你将它拆分成这样，就不再需要正则表达式了。 However if you still need regular expressions for other reasons you can still use this same technique. 但是，如果由于其他原因仍然需要正则表达式，您仍然可以使用相同的技术。

Note that this does not match 'Kansas Kansas State' as you requested. 请注意，这与您要求的“堪萨斯州堪萨斯州”不符。

Update: If matching 'Kansas Kansas State' is that important then you can use this ugly regular expression that is supported by MySQL: 更新：如果匹配'堪萨斯州堪萨斯州'那么重要，那么你可以使用MySQL支持的这个丑陋的正则表达式：

'Kansas($|[^ ]| ($|[^S])| S($|[^t])| St($|[^a])| Sta($|[^t])| Stat($|[^e]))'

Oops: I just noticed Kip already updated his comment with a solution very similar to this. 哎呀：我刚注意到Kip已经用一个非常类似于此的解决方案更新了他的评论。

Answer 2

This should work, assuming look-ahead assertions are allowed in MySQL regexes. 这应该有效，假设在MySQL正则表达式中允许前瞻性断言。

/Kansas(?! State)/

Edit : OK, this is super ugly, but it works for me in Perl and doesn't use a look-ahead assertion: 编辑：好的，这是非常丑陋的，但它在Perl中适用于我，并且不使用前瞻性断言：

/Kansas(([^ ]|$)| (([^S]|$)|S(([^t]|$)|t(([^a]|$)|a(([^t]|$)|t([^e]|$))))))/

Answer 3

More efficient than that large regex (depending, of course, on your data and the quality of the engine) is 比大型正则表达式更高效（当然，取决于您的数据和引擎的质量）

WHERE col LIKE '%Kansas%' AND
  (col NOT LIKE '%Kansas State%' OR
  REPLACE(col, 'Kansas State', '') LIKE '%Kansas%')

If Kansas usually appears in the form 'Kansas State', though, you may find this better: 如果堪萨斯州通常以“堪萨斯州”的形式出现，你可能会发现这更好：

WHERE col LIKE '%Kansas%' AND
  REPLACE(col, 'Kansas State', '') LIKE '%Kansas%'

This has the added advantage of being easier to maintain. 这具有易于维护的附加优点。 It works less well if Kansas is common and text fields are large. 如果堪萨斯很常见且文本字段很大，那么它的效果就不那么好了。 Of course you can test these on your own data and tell us how they compare. 当然，您可以根据自己的数据测试这些数据并告诉我们它们的比较方式。

Answer 4

This is ugly, but here you go: 这很难看，但是你走了：

You might not need to expand the regex all the way to the end, depending on whether your input might include something like 'I need to get this man to surgery in Kansas Stat!' 你可能不需要将正则表达式一直扩展到最后，这取决于你的输入是否包括“我需要让这个人在堪萨斯统计中接受手术！”。

mysql> select x,x RLIKE 'Kansas($|[^ ]| ($|[^S])| S($|[^t])| St($|[^a])| Sta($|[^t])| Stat($|[^e]))' AS result from examples;
+------------------------+--------+
| x                      | result |
+------------------------+--------+
| I am from Kansas       |      1 |
| Kansas State is great  |      0 |
| Kansas is a state      |      1 |
| Kansas Kansas State    |      1 |
| Kansas State vs Kansas |      1 |
| I'm from Kansas State  |      0 |
| KansasState            |      1 |
+------------------------+--------+
7 rows in set (0.00 sec)

正则表达式匹配除特定给定字符串之外的任何内容（包括空字符串）

问题描述

4 个解决方案

解决方案1
4 2010-05-14 20:24:18

解决方案2
2 2010-05-14 20:12:54

解决方案3
2 已采纳 2010-05-21 04:44:44

解决方案4
1 2010-05-14 20:56:42

正则表达式匹配除特定给定字符串之外的任何内容（包括空字符串）

问题描述

4 个解决方案

解决方案1 4 2010-05-14 20:24:18

解决方案2 2 2010-05-14 20:12:54

解决方案3 2 已采纳 2010-05-21 04:44:44

解决方案4 1 2010-05-14 20:56:42

解决方案1
4 2010-05-14 20:24:18

解决方案2
2 2010-05-14 20:12:54

解决方案3
2 已采纳 2010-05-21 04:44:44

解决方案4
1 2010-05-14 20:56:42