简体   繁体   English

使用正则表达式,是否可以在同一范围内同时使用“是”和“不是”范围定义?

[英]with regex, is using both “is” and “is not” range definitons within the same range possible?

Note: I'm using a 3rd party app that uses regex for searches which has its own flavor but almost always works like java's flavor of regex. 注意:我正在使用第三方应用程序,它使用正则表达式进行搜索,它有自己的味道,但几乎总是像java的正则表达式一样。 Of course this may not matter. 当然这可能无关紧要。

After searching for many different ways of this same question (phrased many ways), I did not see any tutorials, examples, or even mentions of whether it is possible to use both an "is" (positive?) and "is not" (negative?) definition within the same range. 在搜索了同样问题的许多不同方式(用多种方式表达)后,我没有看到任何教程,例子,甚至提到是否可以同时使用“是”(正面?)和“不是”(否定?)在同一范围内的定义。

I can't run a test the example right now in the app to see if my ideas work, because the amount of data being searched is massive and will screw up the matches it has already gathered. 我现在无法在应用程序中运行测试示例以查看我的想法是否有效,因为搜索的数据量很大并且会搞砸已经收集的匹配项。 I'm only asking because of this. 我只是因为这个问。

Here are examples of what I thought might work but caused tester to act weird: 以下是我认为可行的例子,但导致测试人员表现得很奇怪:

[\w^\s<>.!?]{2}
[\w|^\s<>.!?]{2}

I would rather have it work the way I think the first one would work (any digit, lower case, or upper case character, or other normal character that is not a space, >, <, period, !, or ?) rather then the second which only has an or operator. 我宁愿按照我认为第一个可行的方式工作(任何数字,小写或大写字符,或其他不是空格的正常字符,>,<,句号,!或?)而不是第二个只有一个或运算符。

The regex testers I used gave me different funky results which is what is confusing me. 我使用的正则表达式测试人员给了我不同的时髦结果,令我感到困惑。

Also note: I'm using this within a capture group which is followed by a catch everything match which I may or may not be using properly. 还要注意:我在捕获组中使用它,然后捕获一切​​匹配,我可能正在使用或未正确使用。 So if you'd like to include how to follow what I'm attempting with how to properly do that, feel free. 因此,如果你想包括如何遵循我正在尝试如何正确地做到这一点,请随意。 I AM MAINLY JUST CURIOUS TO IF THIS WAS POSSIBLE OR NOT, OR IF IT WAS A IMPROPER METHOD. 如果这是否可能,或者如果它是一个不正确的方法,我主要是很好的。

Why do you need the \\w at all? 为什么你需要\\w

[^\s<>.!?]{2}

This already matches all alphanumeric characters since they are neither space nor any of the punctuation characters you mentioned. 这已匹配所有字母数字字符,因为它们既不是空格也不是您提到的任何标点字符。

In general, you can substract character classes to some degree, for example, to match alphanumerics exluding digits, you can do 一般来说,你可以在一定程度上减去字符类,例如,匹配排除数字的字母数字,你可以做

[^\W\d]

because [^\\W] matches the same as \\w , and \\d is substracted from that because it's in a negated character class. 因为[^\\W]\\w匹配,并且\\d从中减去,因为它在一个否定的字符类中。

Edit: 编辑:

Some regex engines (like XPath, .NET and JGSoft) allow flexible character class substraction like this: 一些正则表达式引擎(如XPath,.NET和JGSoft)允许灵活的字符类减法,如下所示:

[a-z-[e-g]]

to match any character from the range [az] , excluding e , f and g . 匹配范围[az]中的任何字符,不包括efg But Java does not have this feature. 但是Java没有这个功能。

Another possibility is to use two ranges and combine them; 另一种可能性是使用两个范围并将它们组合起来; eg 例如

([\w]|[^\s<>.!?]){2}

However, this does bring up the question of what you are actually trying to express here. 但是,这确实提出了您实际上要在此表达的内容的问题。 Because this example (as I've rewritten it) doesn't make a lot of sense. 因为这个例子(因为我重写了它)并没有多大意义。

What it says is "a word character, or any character that is not whitespace or certain punctuation". 它所说的是“一个单词字符,或任何不是空格或某些标点符号的字符”。 But the class of characters that are not "whitespace or certain punctuation" ALREADY includes all of the word characters. 但是,不是“空白或某些标点符号”的字符类包括所有单词字符。 So, unless you mean something different, the \\w is redundant. 所以,除非你的意思不同,否则\\w是多余的。

从您的问题来看,它看起来像一个无空间的正则表达式可以满足您的需求,您可以通过以下方式实现:

[\S]{2}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM