简体   繁体   English

正则表达式 - 如何匹配除特定模式之外的所有内容

[英]Regex - how to match everything except a particular pattern

How do I write a regex to match any string that doesn't meet a particular pattern?如何编写正则表达式来匹配任何不符合特定模式的字符串? I'm faced with a situation where I have to match an (A and ~B) pattern.我面临必须匹配 (A and ~B) 模式的情况。

You could use a look-ahead assertion:您可以使用前瞻断言:

(?!999)\d{3}

This example matches three digits other than 999 .此示例匹配999以外的三个数字。


But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors ), you probably have to build a regular expression with the basic features on your own.但是,如果您碰巧没有具有此功能的正则表达式实现(请参阅正则表达式风格的比较),您可能必须自己构建具有基本功能的正则表达式。

A compatible regular expression with basic syntax only would be:仅具有基本语法的兼容正则表达式将是:

[0-8]\d\d|\d[0-8]\d|\d\d[0-8]

This does also match any three digits sequence that is not 999 .这也匹配任何不是999的三位数字序列。

If you want to match a word A in a string and not to match a word B. For example: If you have a text:如果要匹配字符串中的单词 A 而不是匹配单词 B。例如:如果您有文本:

1. I have a two pets - dog and a cat
2. I have a pet - dog

If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:如果你想搜索有宠物但没有猫的文本行,你可以使用这个正则表达式:

^(?=.*?\bdog\b)((?!cat).)*$

It will find only second line:它只会找到第二行:

2. I have a pet - dog

Match against the pattern and use the host language to invert the boolean result of the match.匹配模式并使用宿主语言反转匹配的 boolean 结果。 This will be much more legible and maintainable.这将更加清晰和可维护。

notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned.不是,复活了这个古老的问题,因为它有一个没有提到的简单解决方案。 (Found your question while doing some research for a regex bounty quest .) (在对正则表达式赏金任务进行一些研究时发现了您的问题。)

I'm faced with a situation where I have to match an (A and ~B) pattern.我面临必须匹配 (A and ~B) 模式的情况。

The basic regex for this is frighteningly simple: B|(A)基本的正则表达式非常简单: B|(A)

You just ignore the overall matches and examine the Group 1 captures, which will contain A.您只需忽略整体匹配并检查将包含 A 的第 1 组捕获。

An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag一个例子(所有关于在正则表达式中解析 html 的免责声明):A 是数字,B 是<a tag内的数字

The regex: <a.*?<\/a>|(\d+)正则表达式: <a.*?<\/a>|(\d+)

Demo (look at Group 1 in the lower right pane)演示(查看右下窗格中的第 1 组)

Reference参考

How to match pattern except in situations s1, s2, s3除了情况 s1、s2、s3 之外,如何匹配模式

How to match a pattern unless...如何匹配模式,除非...

The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error.正则语言的补语也是正则语言,但要构造它,您必须为正则语言构建DFA ,并使任何有效的 state 变为错误。 See this for an example.请参阅示例。 What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/ .该页面没有说的是它已将/(ac|bd)/转换为/(a[^c]?|b[^d]?|[^ab])/ The conversion from a DFA back to a regular expression is not trivial.从 DFA 转换回正则表达式并非易事。 It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.如果您可以像之前建议的那样使用未更改的正则表达式并更改代码中的语义,则会更容易。

pattern - re模式 - 重新

str.split(/re/g) 

will return everything except the pattern.将返回除模式之外的所有内容。

Test here在这里测试

My answer here might solve your problem as well:我在这里的回答也可能解决您的问题:

https://stackoverflow.com/a/27967674/543814 https://stackoverflow.com/a/27967674/543814

  • Instead of Replace, you would use Match.而不是替换,你会使用匹配。
  • Instead of group $1 , you would read group $2 .而不是 group $1 ,您将阅读 group $2
  • Group $2 was made non-capturing there, which you would avoid.$2在那里不被捕获,这是你会避免的。

Example:例子:

Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");

The first capturing group specifies the pattern that you wish to avoid.第一个捕获组指定您希望避免的模式。 The last capturing group captures everything else.最后一个捕获组捕获其他所有内容。 Simply read out that group, $2 .只需读出该组$2

(B)|(A)

then use what group 2 captures...然后使用第 2 组捕获的内容...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM