Python-正则表达式-查找所有重复项

Question

I'm trying to match e-mails in html text using the following code in python 我正在尝试使用python中的以下代码匹配html文本中的电子邮件

my_second_pat = '((\w+)( *?))(@|[aA][tT]|\([aA][tT]\))(((( *?)(\w+)( *?))(\.|[dD][oO][tT]|\([dD][oO][tT]\)))+)([eE][dD][uU]|[cC][oO][mM])'


matches = re.findall(my_second_pat,line)
for m in matches:
    s = "".join(m)
    email = "".join(s.split())
    res.append((name,'e',email))

when I run it on a line = shoham@stanford.edu 当我在line = shoham@stanford.edu上运行它时line = shoham@stanford.edu

I get: 我得到：

[('shoham', 'shoham', '', '@', 'stanford.', 'stanford.', 'stanford', '', 'stanford', '', '.', 'edu')]

what I expect: 我所期望的：

[('shoham','@', 'stanford.', 'edu')]

It's matched as a one string on regexpal.com, so I guess I'm having trouble with re.findall 它在regexpal.com上作为一个字符串匹配，所以我想我在re.findall上遇到麻烦

I'm new to both regex, and python. 我是regex和python的新手。 Any optimization/modifications is welcomed. 欢迎进行任何优化/修改。

Answer 1

It is matching all of your capture groups, which contain optional matches. 它与您的所有捕获组都匹配，其中包含可选匹配项。

Try this: 尝试这个：

((?:(?:\w+)(?: *?))(?:@|[aA][tT]|\(?:[aA][tT]\))(?:(?:(?:(?: *?)(?:\w+)(?: *?))(?:\.|[dD][oO][tT]|\(?:[dD][oO][tT]\)))+)(?:[eE][dD][uU]|[cC][oO][mM]))

See this link to debug your expression: 请参阅以下链接调试表达式：

http://regex101.com/r/jW4mP1 http://regex101.com/r/jW4mP1

Answer 2

Try this: 尝试这个：

(?i)([^@\s]{2,})(?:@|\s*at\s*)([^@\s.]{2,})(?:\.|\s*dot\s*)([^@\s.]{2,})

正则表达式可视化

Debuggex Demo Debuggex演示

If you need to limit to .com and .edu : 如果您需要限制.com和.edu ：

(?i)([^@\s]{2,})(?:@|\s*at\s*)([^@\s.]{2,})(?:\.|\s*dot\s*)(com|edu)

正则表达式可视化

Debuggex Demo Debuggex演示

Note that I have used the case-insensitive flag (?i) at the start of the regex, instead of using syntax like [Ee] . 注意，我在正则表达式的开头使用了不区分大小写的标志(?i) ，而不是使用[Ee]这样的语法。

Python-正则表达式-查找所有重复项

问题描述

2 个解决方案

解决方案1
1 2014-03-10 05:09:27

Try this: 尝试这个：

See this link to debug your expression: 请参阅以下链接调试表达式：

解决方案2
1 已采纳 2014-03-10 05:10:36

Python-正则表达式-查找所有重复项

问题描述

2 个解决方案

解决方案1 1 2014-03-10 05:09:27

Try this: 尝试这个：

See this link to debug your expression: 请参阅以下链接调试表达式：

解决方案2 1 已采纳 2014-03-10 05:10:36

解决方案1
1 2014-03-10 05:09:27

解决方案2
1 已采纳 2014-03-10 05:10:36