[英]Extract substrings with regular expression
Let's say I have a string:假设我有一个字符串:
L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!
And I need to extract the name - BIANCA and the text that is at the end into two variables.我需要将名称 - BIANCA 和末尾的文本提取到两个变量中。 I tried to do somthen like this:
我试着这样做:
dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
name : str = ""
line : str = ""
name = re.findall('^L.*\s(.+?)\s.*', dialogue)
but I'm a little confused about using regular expression.但我对使用正则表达式有点困惑。 How can I solve this using regular expression?
如何使用正则表达式解决这个问题?
Thanks!谢谢!
You can do that without re
你可以不用
re
data = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
parts = data.split('+++$+++')
print(parts[-2].strip())
print(parts[-1].strip())
output输出
BIANCA
They do not!
You can use this regex:您可以使用此正则表达式:
[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$
Python: Python:
import re
dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
>>> re.findall(r'[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$', dialogue)
[('BIANCA', 'They do not!')]
You can also split and slice:您还可以拆分和切片:
>>> re.split(r'[ \t]\+{3}\$\+{3}[ \t]', dialogue)[-2:]
['BIANCA', ' They do not!']
But split and slice does not gracefully fail if +++$+++
is not found;但是如果没有找到
+++$+++
split 和 slice 不会优雅地失败; the search pattern above does.上面的搜索模式确实如此。
You could match L
at the start of the string, and use a quantifier {n}
to set the number of occurrences to match +++$+++
followed by non whitespace characters.您可以在字符串的开头匹配
L
,并使用量词{n}
设置出现次数以匹配+++$+++
后跟非空白字符。
^L\S*(?: \+{3}\$\+{3} \S+){2} \+{3}\$\+{3} (\S+) \+{3}\$\+{3} (.+)$
The pattern matches:模式匹配:
^
Start of string ^
字符串开始L\\S*
Match L
followed by optional non whitespace chars L\\S*
匹配L
后跟可选的非空白字符(?: \\+{3}\\$\\+{3} \\S+){2}
Using a quantifier, repeat 2 times matching the delimiter followed by 1+ non whitespace chars (?: \\+{3}\\$\\+{3} \\S+){2}
使用量词,重复 2 次匹配分隔符后跟 1+ 个非空白字符\\+{3}\\$\\+{3}
Match the delimiter \\+{3}\\$\\+{3}
匹配分隔符(\\S+)
Capture group 1 , match 1+ non whitespace chars to match BIANCA
(\\S+)
捕获组 1 ,匹配 1+ 个非空白字符以匹配BIANCA
\\+{3}\\$\\+{3}
Match the delimiter \\+{3}\\$\\+{3}
匹配分隔符(.+)
Capture group 2 , match 1+ times any char except a newline to match They do not!
(.+)
捕获组 2 ,匹配除换行符以外的任何字符的 1+ 次以匹配They do not!
$
End of string $
字符串结尾
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.