简体   繁体   English

Java + Regex:匹配自定义集合中的字符,这些字符前面没有同一集合中的字符

[英]Java + Regex: matching characters from a customized set that are not preceded by characters in the same set

I am running into a silly problem with Regular Expressions in Java where I would like to match a String that begins with @ with characters from a certain valid set, but are not preceded by characters from the same valid set. 我正在使用Java中的正则表达式遇到一个愚蠢的问题,我希望将以@开头的字符串与来自某个有效集的字符进行匹配,但不会出现来自同一有效集的字符。

The terms I would like to match are of the form: 我想要匹配的术语是以下形式:

"y" + @ + "xxxxxxx" “y”+ @ +“xxxxxxx”

where: 哪里:

  • x is a character that belongs to the valid set [a-zA-Z\\\\d\\\\-\\\\_] x是属于有效集的字符[a-zA-Z\\\\d\\\\-\\\\_]
  • the @ sign appears once @符号出现一次
  • y is a character that does not belong to the valid set [a-zA-Z\\\\d\\\\-\\\\_] y是属于有效集的字符[a-zA-Z\\\\d\\\\-\\\\_]

I'm currently trying to do this this by using the following regular expressions pattern 我目前正试图通过使用以下正则表达式模式来做到这一点

MY_PATTERN = "[^[A-Za-z\\d\\-\\_]?]" + "@{1}" + "[A-Za-z\\d\\-\\_]+"
String text = "12a@cat123-_     @dog123__- ";
Pattern p = Pattern.compile(PATTERN);
Matcher m = p.matcher(text);

Based on this I expect the following code to only print @dog123__- 基于此我希望以下代码只打印@dog123__-

while(m.find()){ String s = m.group(); System.out.println(s); }

However it also prints out a@cat123-_ . 但它也打印出a@cat123-_

Could someone explain what I'm doing wrong? 有人可以解释我做错了吗?

I'm assuming the text you're trying to match could be anywhere, and not anchored to the start of the string. 我假设你想要匹配的文本可以在任何地方,而不是锚定到字符串的开头。

The syntax you used for [^[A-Za-z\\\\d\\\\-\\\\_]?] is wrong and is being interpreted as something else (let's not get into that). 您用于[^[A-Za-z\\\\d\\\\-\\\\_]?]的语法是错误的,并且被解释为其他东西(让我们不要进入那个)。 Negated character classes are [^chars] . 否定的字符类是[^chars] So the syntax should have been [^A-Za-z\\\\d\\\\-_] . 所以语法应该是[^A-Za-z\\\\d\\\\-_] However, that requires to match that character before the "@" , so it won't match "@foo" , because "there isn't a character (that is not A-Za-z0-9-_) before". 但是,这需要"@"之前匹配该字符,因此它不匹配"@foo" ,因为“之前没有字符(不是A-Za-z0-9-_)”。

Lookbehinds to the rescue. 寻求救援的后盾 A negative lookbehind (?<!subpattern) specifies the current position is not preceded by subpattern. 负向lookbehind (?<!subpattern)指定当前位置不以子模式开头。

Oh, and one more thing, [A-Za-z\\\\d\\\\-_] is the same as [-\\\\w] (let's use that shorter version). 哦,还有一件事, [A-Za-z\\\\d\\\\-_][-\\\\w] (让我们使用更短的版本)。

So the regex should be: 所以正则表达式应该是:

(?<![-\\w])@[-\\w]+

ideone Demo ideone演示

You have some issues in your pattern, here's one that should do it: 您的模式中存在一些问题,这是应该执行此操作的问题:

(?:^|[^A-Za-z\d\-\_])(@[A-Za-z\d\-\_]+)
  1. @{1} it's the same as @ @{1}@
  2. [^[A-Za-z\\d\\-\\_]?] The problem seems to be here, you're using nested character sets, which doesn't work [^[A-Za-z\\d\\-\\_]?]问题似乎在这里,你使用嵌套字符集,这不起作用
  3. It should be [^A-Za-z\\d\\-\\_] 它应该是[^A-Za-z\\d\\-\\_]

You could simplify the regex to: (?:^|[^\\w\\-])(@[\\w\\-]+) 您可以将正则表达式简化为: (?:^|[^\\w\\-])(@[\\w\\-]+)

\\w matches any alphanumeric character & underscore \\w匹配任何字母数字字符和下划线

Test this: http://regexr.com/3bt77 测试一下: http//regexr.com/3bt77

It's javascript, but you shouldn't have any issues. 它是javascript,但你不应该有任何问题。

Your regex can be simplified considerably, given: 鉴于以下情况,您的正则表达式可以大大简化:

[a-zA-Z\\d\\-\\_] === [\w-]

so this is what what you want: 所以这就是你想要的:

[^\w-]@[\w-]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM