简体   繁体   English

正则表达式前瞻丢弃匹配

[英]Regex lookahead discard a match

I am trying to make a regex match which is discarding the lookahead completely.我正在尝试进行正则表达式匹配,它完全放弃了前瞻。

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

This is the match and this is my regex101 test .这是比赛,这是我的regex101 测试

But when an email starts with - or _ or .但是当 email 以-_或 开头时. it should not match it completely, not just remove the initial symbols.它不应该完全匹配它,而不仅仅是删除初始符号。 Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.欢迎任何想法,我一直在寻找过去半小时,但无法弄清楚当它以这些符号开头时如何删除整个 email。

You can use the word boundary near @ with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\\s\\-_.] : 您可以在@附近使用带有负向lookbehind的单词边界来检查我们是在字符串的开头还是在空格后面,然后检查第一个符号是否在不需要的类中[^\\s\\-_.]

(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b@\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*

See demo 演示

List of matches: 比赛清单:

support@github.com
s.miller@mit.edu
j.hopking@york.ac.uk
steve.parker@soft.de
info@company-hotels.org
kiki@hotmail.co.uk
no-reply@github.com
s.peterson@mail.uu.net
info-bg@software-software.software.academy

Additional notes on usage and alternative notation 关于使用和备选表示法的附加说明

Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\\s\\-_.] can be written as [^\\s_.-] , with the hyphen at the end of the character class still denoting a literal hyphen, not a range. 请注意,最好在正则表达式中使用尽可能少的转义字符,因此, [^\\s\\-_.]可以写为[^\\s_.-] ,连字符位于结尾处字符类仍然表示文字连字符,而不是范围。 Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\\s|^) with the equivalent (?<!\\S) . 此外,如果您计划在其他正则表达式引擎中使用该模式,您可能会发现在lookbehind中交替出现问题,然后您可以用等效的(?<!\\S)替换(?<=\\s|^) (?<!\\S) See this regex : 看到这个正则表达式

(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b@\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*

And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\\S) / (?<=\\s|^) with a (non)capturing group (\\s|^) , wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents: 最后但并非最不重要的是,如果您需要在JavaScript或其他不支持外观的语言中使用它,请将(?<!\\S) / (?<=\\s|^)替换为(非)捕获组(\\s|^) ,用另一组捕获括号包装整个电子邮件模式部分,并使用语言手段获取第1组内容:

(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b@\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)

See the regex demo . 请参阅正则表达式演示

I use this for multiple email addresses, separate with ';': 我将它用于多个电子邮件地址,用';'分隔:

([A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*

For a single mail: 对于单个邮件:

[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM