在 Python 正则表达式中捕获重复的子模式

Question

While matching an email address, after I match something like yasar@webmail , I want to capture one or more of (\.\w+) (what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+, but it only captures last match.在匹配 email 地址时，在匹配yasar@webmail类的内容后，我想捕获一个或多个(\.\w+) （我所做的有点复杂，这只是一个例子），我试过了添加 (.\w+)+，但它只捕获最后一个匹配项。 For example, yasar@webmail.something.edu.tr matches but only include .tr after yasar@webmail part, so I lost .something and .edu groups.例如， yasar@webmail.something.edu.tr @webmail.something.edu.tr 匹配但仅在yasar@webmail部分之后包含.tr ，因此我丢失了.something和.edu组。 Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?我可以在 Python 正则表达式中执行此操作，还是您建议首先匹配所有内容，然后再拆分子模式？

Answer 1

re module doesn't support repeated captures ( regex supports it): re模块不支持重复捕获（ regex支持）：

>>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later.在您的情况下，我稍后会拆分重复的子模式。 It leads to a simple and readable code eg, see the code in @Li-aung Yip's answer .它会生成一个简单易读的代码，例如，请参阅@Li-aung Yip 的回答中的代码。

Answer 2

This will work:这将起作用：

>>> regexp = r"[\w\.]+@(\w+)(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?"
>>> email_address = "william.adama@galactica.caprica.fleet.mil"
>>> m = re.match(regexp, email_address)
>>> m.groups()
('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups.但它仅限于最多六个子组。 A better way to do this would be:一个更好的方法是：

>>> m = re.match(r"[\w\.]+@(.+)", email_address)
>>> m.groups()
('galactica.caprica.fleet.mil',)
>>> m.group(1).split('.')
['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for.请注意，只要电子邮件地址很简单，regexp 就可以了 - 但是这会破坏各种事情。 See this question for a detailed treatment of email address regexes.有关电子邮件地址正则表达式的详细处理，请参阅此问题。

Answer 3

您可以通过执行以下操作来解决(\\.\\w+)+仅捕获最后一个匹配项的问题： ((?:\\.\\w+)+)

Answer 4

This is what you are looking for:这就是你要找的：

>>> import re

>>> s="yasar@webmail.something.edu.tr"
>>> r=re.compile("\.\w+")
>>> m=r.findall(s)

>>> m
['.something', '.edu', '.tr']

在 Python 正则表达式中捕获重复的子模式

问题描述

4 个解决方案

解决方案1
36 2012-03-19 05:22:44

解决方案2
14 2012-03-19 04:50:04

解决方案3
9 2012-03-19 04:28:11

解决方案4
4 2017-10-04 18:22:38

在 Python 正则表达式中捕获重复的子模式

问题描述

4 个解决方案

解决方案1 36 2012-03-19 05:22:44

解决方案2 14 2012-03-19 04:50:04

解决方案3 9 2012-03-19 04:28:11

解决方案4 4 2017-10-04 18:22:38

解决方案1
36 2012-03-19 05:22:44

解决方案2
14 2012-03-19 04:50:04

解决方案3
9 2012-03-19 04:28:11

解决方案4
4 2017-10-04 18:22:38