使用正则表达式分割字符串并包含模式

Question

I need to split a string on the degree (MSC, BSc,...) and keep the name with the title in column 0 and the address in column 1. Note the country code at the end BS matches the title 我需要在度数上拆分一个字符串（MSC，BSc等），并在标题0列中保留标题名称，在地址1列中保留地址。 请注意，末尾BS的国家/地区代码与标题匹配

Please find some sample data below: 请在下面找到一些示例数据：

Phillipp Shuster MSc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS Taylor Street, Duncan Town BS

I want to finish as below: 我要完成以下操作：

Phillipp Shuster MSc    |   Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc          |   Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS         |   Taylor Street, Duncan Town BS

I tried this, but this adds the title to the address ( Not correct ). 我尝试了此操作，但这将标题添加到地址中（ 不正确 ）。

splitted=re.split("\s(?=(?:msc|bsc|bs)[^$])",participants, flags=re.IGNORECASE)

Phillipp Shuster    | Msc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager          | BSc   Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters        | BS Taylor Street, Duncan Town BS

Answer 1

You can use this mate 你可以用这个伴侣

(?<=\bmsc)|(?<=\bbsc)|(?<=\bbs)\s

(?<=\\bmsc) - Matches msc . (?<=\\bmsc) -匹配msc
(?<=\\bbsc) - Matches bsc . (?<=\\bbsc) -匹配bsc 。
(?<=\\bbs) - Matches bs . (?<=\\bbs) -匹配bs 。
\\s - Matches space. \\s匹配空格。

Demo 演示

Answer 2

Instead of splitting I would suggest re.subn approach: 我不建议拆分，而是建议使用re.subn方法：

import re

data = '''Phillipp Shuster MSc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS Taylor Street, Duncan Town BS'''

pattern = re.compile(r'^.+? (msc|bsc|bs)', flags=re.I)

for line in data.split('\n'):
    result = pattern.subn(lambda m: '{:<20s} | '.format(m.group()), line, count=1)
    print(result[0])

The output: 输出：

Phillipp Shuster MSc |  Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc       |  Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS      |  Taylor Street, Duncan Town BS

Answer 3

Instead of split you can use this simple regex with 2 captured group in findall : 您可以将这个简单的正则表达式与findall 2个捕获组一起使用，而不用split ：

reg = r'(?i)^(.*\s[BM]Sc?)\s+(.+)$'

RegEx Demo 正则演示

RegEx Description: RegEx说明：

(?i) : Ignore case mode (?i) ：忽略大小写模式
^ : start ^ ：开始
(.*\\s[BM]Sc?) : Match 0+ characters till BSc or BS or MS or Msc in capture group 1 (.*\\s[BM]Sc?) ：匹配0+个字符，直到捕获组1中的BSc或BS或MS或Msc
\\s+ : Match 1+ whitespaces \\s+ ：匹配1+个空格
(.+) : Match 1+ characters until end in 2nd capture group (.+) ：匹配1+个字符，直到在第二个捕获组中结束
$ : End $ ：结束

Answer 4

My 2c using re.sub : 我的2c使用re.sub ：

import re
x = """Phillipp Shuster MSc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS Taylor Street, Duncan Town BS"""

for y in x.split("\n"):
    print(re.sub("^(.*?(?:MS|BS)c?)(.*)", r"\1 |\2", y, 0, re.DOTALL))

Output: 输出：

Phillipp Shuster MSc | Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc | Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS | Taylor Street, Duncan Town BS

Python Demo Python演示
Regex Demo 正则表达式演示

使用正则表达式分割字符串并包含模式

问题描述

4 个解决方案

解决方案1
1 已采纳 2018-12-15 17:28:05

解决方案2
1 2018-12-15 17:44:32

解决方案3
1 2018-12-15 17:46:51

解决方案4
0 2018-12-15 18:10:47

使用正则表达式分割字符串并包含模式

问题描述

4 个解决方案

解决方案1 1 已采纳 2018-12-15 17:28:05

解决方案2 1 2018-12-15 17:44:32

解决方案3 1 2018-12-15 17:46:51

解决方案4 0 2018-12-15 18:10:47

解决方案1
1 已采纳 2018-12-15 17:28:05

解决方案2
1 2018-12-15 17:44:32

解决方案3
1 2018-12-15 17:46:51

解决方案4
0 2018-12-15 18:10:47