简体   繁体   English

电子邮件的冲突正则表达式(Python)

[英]Colliding regex for emails (Python)

I'm trying to grab both usernames (such as abc123@) and emails (such as (abc123@company.com) in the same Pythonic regex.我试图在同一个 Pythonic 正则表达式中同时获取用户名(例如 abc123@)和电子邮件(例如 (abc123@company.com))。

Here's an example statement:这是一个示例语句:

abc123@ is a researcher at abc123@company.com doing cool work.

Regex used:使用的正则表达式:

For username:对于用户名:

re.match("^([A-Za-z])+([@]){1}$")

For email:对于 email:

re.match("^([A-Za-z0-9-_])+(@company.com){1}$")

Most cases, what happens is username gets grabbed but not email address (trying to grab them as two separate entities) - any ideas what's going on?大多数情况下,会发生什么情况是用户名被抓取,但不是 email 地址(试图将它们作为两个单独的实体抓取) - 有什么想法吗?

Actually you have a lot of groups and repetition counts and start/end boundaries in your regexes that are not really necessary.实际上,您的正则表达式中有很多组和重复计数以及开始/结束边界,这些并不是真正必要的。 These 2 are just enough to find each in the input string.这两个足以在输入字符串中找到每个。

For user: [A-Za-z0-9]+@对于用户: [A-Za-z0-9]+@

For email: [A-Za-z0-9-_]+@company.com对于 email: [A-Za-z0-9-_]+@company.com

If, however, you want your groupings, these versions that will work:但是,如果您想要分组,则可以使用以下版本:

For user: ([A-Za-z0-9])+(@)对于用户: ([A-Za-z0-9])+(@)

For email: ([A-Za-z0-9-_]+(@company.com)对于 email: ([A-Za-z0-9-_]+(@company.com)

Disclaimer: I have tested this only on Java, as I am not so familiar with Python.免责声明:我仅在 Java 上对此进行了测试,因为我对 Python 不太熟悉。

In your patterns you use anchors ^ and $ to assert the start and end of the string.在您的模式中,您使用锚^$来断言字符串的开始和结束。

Removing the anchors, will leave this for the username pattern ([A-Za-z])+([@]){1}删除锚点,将其留给用户名模式([A-Za-z])+([@]){1}

Here, you can omit the {1} and the capture groups.在这里,您可以省略{1}和捕获组。 Note that in the example, abc123@ has digits that you are not matching.请注意,在示例中, abc123@包含您不匹配的数字。

Still, using [A-Za-z0-9]+@ will get a partial match in the email abc123@company.com To prevent that, you can use a right hand whitespace boundary.尽管如此,使用[A-Za-z0-9]+@将在 email abc123@company.com中获得部分匹配,您可以使用右侧空白边界。

The username pattern might look like用户名模式可能看起来像

\b[A-Za-z0-9]+@(?!\S)
  • \b A word boundary \b一个词的边界
  • [A-Za-z0-9]+ Match 1+ occurrences of the listed (including the digits) [A-Za-z0-9]+匹配列出的 1+ 次出现(包括数字)
  • @ Match literally @字面上匹配
  • (?!\S) Negative lookahead, assert not a non whitspace char to the right (?!\S)负前瞻,断言右边不是非空白字符

Regex demo正则表达式演示

For the email address, using a character class like [A-Za-z0-9-_] is quite strict.对于 email 地址,使用像[A-Za-z0-9-_]这样的字符 class 非常严格。

If you want a broad match, you might use:如果你想要一个广泛的匹配,你可以使用:

[^\s@]+@[^\s@]+\.[a-z]{2,}

Regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM