简体   繁体   English

使用正则表达式分割字符串并包含模式

[英]Split a string using regex and include pattern

I need to split a string on the degree (MSC, BSc,...) and keep the name with the title in column 0 and the address in column 1. Note the country code at the end BS matches the title 我需要在度数上拆分一个字符串(MSC,BSc等),并在标题0列中保留标题名称,在地址1列中保留地址。 请注意,末尾BS的国家/地区代码与标题匹配

Please find some sample data below: 请在下面找到一些示例数据:

Phillipp Shuster MSc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS Taylor Street, Duncan Town BS

I want to finish as below: 我要完成以下操作:

Phillipp Shuster MSc    |   Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc          |   Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS         |   Taylor Street, Duncan Town BS

I tried this, but this adds the title to the address ( Not correct ). 我尝试了此操作,但这将标题添加到地址中( 不正确 )。

splitted=re.split("\s(?=(?:msc|bsc|bs)[^$])",participants, flags=re.IGNORECASE)

Phillipp Shuster    | Msc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager          | BSc   Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters        | BS Taylor Street, Duncan Town BS

You can use this mate 你可以用这个伴侣

(?<=\bmsc)|(?<=\bbsc)|(?<=\bbs)\s
  • (?<=\\bmsc) - Matches msc . (?<=\\bmsc) -匹配msc
  • (?<=\\bbsc) - Matches bsc . (?<=\\bbsc) -匹配bsc
  • (?<=\\bbs) - Matches bs . (?<=\\bbs) -匹配bs
  • \\s - Matches space. \\s匹配空格。

Demo 演示

Instead of splitting I would suggest re.subn approach: 我不建议拆分,而是建议使用re.subn方法:

import re

data = '''Phillipp Shuster MSc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS Taylor Street, Duncan Town BS'''

pattern = re.compile(r'^.+? (msc|bsc|bs)', flags=re.I)

for line in data.split('\n'):
    result = pattern.subn(lambda m: '{:<20s} | '.format(m.group()), line, count=1)
    print(result[0])

The output: 输出:

Phillipp Shuster MSc |  Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc       |  Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS      |  Taylor Street, Duncan Town BS

Instead of split you can use this simple regex with 2 captured group in findall : 您可以将这个简单的正则表达式与findall 2个捕获组一起使用,而不用split

reg = r'(?i)^(.*\s[BM]Sc?)\s+(.+)$'

RegEx Demo 正则演示

RegEx Description: RegEx说明:

  • (?i) : Ignore case mode (?i) :忽略大小写模式
  • ^ : start ^ :开始
  • (.*\\s[BM]Sc?) : Match 0+ characters till BSc or BS or MS or Msc in capture group 1 (.*\\s[BM]Sc?) :匹配0+个字符,直到捕获组1中的BScBSMSMsc
  • \\s+ : Match 1+ whitespaces \\s+ :匹配1+个空格
  • (.+) : Match 1+ characters until end in 2nd capture group (.+) :匹配1+个字符,直到在第二个捕获组中结束
  • $ : End $ :结束

My 2c using re.sub : 我的2c使用re.sub

import re
x = """Phillipp Shuster MSc Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS Taylor Street, Duncan Town BS"""

for y in x.split("\n"):
    print(re.sub("^(.*?(?:MS|BS)c?)(.*)", r"\1 |\2", y, 0, re.DOTALL))

Output: 输出:

Phillipp Shuster MSc | Grolmanstraße 6 28195 Bremen Bahnhofsvorstadt DE
Eric Jager BSc | Mohrenstrasse 29 72362 Nusplingen DE
Nykee Peters BS | Taylor Street, Duncan Town BS

Python Demo Python演示
Regex Demo 正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM