用于有条件地捕获逗号分隔字符串的 Python 正则表达式

Question

I have a list of person names which can have 3 different styles:我有一个可以有 3 种不同样式的人名列表：

{last name}, {first name} {middle name} (Example: Bob, Dylan Tina) {last name}, {first name} {middle name} （例如：Bob、Dylan Tina）
{last name}, {first name} {middle initial}. (Example: Bob, Dylan T.) （例如：Bob、Dylan T.）
{last name}, {first name} (Example: Bob, Dylan) {last name}, {first name} （例如：Bob、Dylan）

And this is the regex which I wrote:这是我写的正则表达式：

^[a-zA-Z]+(([' ,.-][a-zA-Z ])?[a-zA-Z]*)*$

But it doesn't work.但它不起作用。

Answer 1

You could write the regex like this你可以这样写正则表达式

^(\w+),\s(\w+)\s*(\w*\.?)$

Here is the demo .这是演示。

Update the regex to like this and you can get three different groups for your three cases将正则表达式更新为这样，您可以为您的三个案例获得三个不同的组

^(\w+,\s\w+\s\w+)$|^(\w+,\s\w+\s\w+\.)$|^(\w+,\s\w+)$

Here is the demo .这是演示。

Here is the python code这是python代码

import re
s2 = "Bob, Dylan"
out = re.findall(r"^(\w+),\s(\w+)\s*(\w*\.?)$",s2)
print(out)

OUTPUT输出

[('Bob', 'Dylan', '')]

Answer 2

You should use this regex:你应该使用这个正则表达式：

(\w+),\s*(\w+)\s*(\w{0,}\.*)

This is the result you'll get:这是你会得到的结果：

>>> import re
>>> s1 = "Bob, Dylan Tina"
>>> s2 = "Bob, Dylan"
>>> s3 = "Bob, Dylan T."
>>> p = re.compile(r"(\w+),\s*(\w+)\s*(\w{0,}\.*)")
>>> re.findall(p, s1)
[('Bob', 'Dylan', 'Tina')]
>>> re.findall(p, s2)
[('Bob', 'Dylan', '')]
>>> re.findall(p, s3)
[('Bob', 'Dylan', 'T.')]

用于有条件地捕获逗号分隔字符串的 Python 正则表达式

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-12-09 06:44:48

解决方案2
0 2019-12-09 06:46:26

用于有条件地捕获逗号分隔字符串的 Python 正则表达式

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-12-09 06:44:48

解决方案2 0 2019-12-09 06:46:26

解决方案1
1 已采纳 2019-12-09 06:44:48

解决方案2
0 2019-12-09 06:46:26