使用正则表达式提取子字符串

Question

Let's say I have a string:假设我有一个字符串：

L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!

And I need to extract the name - BIANCA and the text that is at the end into two variables.我需要将名称 - BIANCA 和末尾的文本提取到两个变量中。 I tried to do somthen like this:我试着这样做：

dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
name : str = ""
line : str = ""
name = re.findall('^L.*\s(.+?)\s.*', dialogue)

but I'm a little confused about using regular expression.但我对使用正则表达式有点困惑。 How can I solve this using regular expression?如何使用正则表达式解决这个问题？

Thanks!谢谢！

Answer 1

You can do that without re你可以不用re

data = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
parts = data.split('+++$+++')
print(parts[-2].strip())
print(parts[-1].strip())

output输出

BIANCA
They do not!

Answer 2

You can use this regex:您可以使用此正则表达式：

[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$

Demo演示

Python: Python：

import re

dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"

>>> re.findall(r'[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$', dialogue)
[('BIANCA', 'They do not!')]

You can also split and slice:您还可以拆分和切片：

>>> re.split(r'[ \t]\+{3}\$\+{3}[ \t]', dialogue)[-2:]
['BIANCA', ' They do not!']

But split and slice does not gracefully fail if +++$+++ is not found;但是如果没有找到+++$+++ split 和 slice 不会优雅地失败； the search pattern above does.上面的搜索模式确实如此。

Answer 3

You could match L at the start of the string, and use a quantifier {n} to set the number of occurrences to match +++$+++ followed by non whitespace characters.您可以在字符串的开头匹配L ，并使用量词{n}设置出现次数以匹配+++$+++后跟非空白字符。

^L\S*(?: \+{3}\$\+{3} \S+){2} \+{3}\$\+{3} (\S+) \+{3}\$\+{3} (.+)$

The pattern matches:模式匹配：

^ Start of string ^字符串开始
L\\S* Match L followed by optional non whitespace chars L\\S*匹配L后跟可选的非空白字符
(?: \\+{3}\\$\\+{3} \\S+){2} Using a quantifier, repeat 2 times matching the delimiter followed by 1+ non whitespace chars (?: \\+{3}\\$\\+{3} \\S+){2}使用量词，重复 2 次匹配分隔符后跟 1+ 个非空白字符
\\+{3}\\$\\+{3} Match the delimiter \\+{3}\\$\\+{3}匹配分隔符
(\\S+) Capture group 1 , match 1+ non whitespace chars to match BIANCA (\\S+)捕获组 1 ，匹配 1+ 个非空白字符以匹配BIANCA
\\+{3}\\$\\+{3} Match the delimiter \\+{3}\\$\\+{3}匹配分隔符
(.+) Capture group 2 , match 1+ times any char except a newline to match They do not! (.+)捕获组 2 ，匹配除换行符以外的任何字符的 1+ 次以匹配They do not!
$ End of string $字符串结尾

Regex demo正则表达式演示

使用正则表达式提取子字符串

问题描述

3 个解决方案

解决方案1
1 2021-10-14 18:22:30

解决方案2
1 已采纳 2021-10-14 18:25:21

解决方案3
0 2021-10-15 07:26:21

使用正则表达式提取子字符串

问题描述

3 个解决方案

解决方案1 1 2021-10-14 18:22:30

解决方案2 1 已采纳 2021-10-14 18:25:21

解决方案3 0 2021-10-15 07:26:21

解决方案1
1 2021-10-14 18:22:30

解决方案2
1 已采纳 2021-10-14 18:25:21

解决方案3
0 2021-10-15 07:26:21