[英]Python regex search until specific word and exclude everything behind it
I have a script that always have the "get the" and the "get" in a string.我有一个总是在字符串中包含“get the”和“get”的脚本。 The "ONE TWO THREE" can vary, like it also can be "THIRTEEN FORTY" or "SIX" .
“一二三”可以变化,就像它也可以是“十三四十”或“六” 。 After these variations there will always be a 2nd "get" .
在这些变化之后,总会有第二个“get” 。
I have the following code:我有以下代码:
variable = 'get the ONE TWO THREE get FOUR FIVE'
myVariable = re.compile(r'(?<=get the) .*')
myVariableSearch = myVariable.search(variable)
mySearchGroup = myVariableSearch.group()
print(mySearchGroup)
#prints ONE TWO THREE get FOUR FIVE
I want my script to exclude the 2nd "get" and everything behind it.我希望我的脚本排除第二个“get”及其后面的所有内容。 My desired result is to be just the "ONE TWO THREE" .
我想要的结果就是“一二三” 。
How do I exclude this?我该如何排除这个? Any help would be appreciated!
任何帮助,将不胜感激!
You can use您可以使用
\bget\s+the\s+(.*?)(?=\s*\bget\b|$)
See the regex demo .请参阅正则表达式演示。
Details细节
\bget\s+the\s+
- whole word get
, 1+ whitespaces, the
, 1+ whitespaces \bget\s+the\s+
- 整个单词get
, 1+ 个空格, the
, 1+ 个空格(.*?)
- Group 1: (.*?)
- 第 1 组:(?=\s*\bget\b|$)
- a positive lookahead that requires 0+ whitespaces and then a whole word get
, or end of string immediately on the right of the current location. (?=\s*\bget\b|$)
- 一个正向前瞻,需要 0+ 个空格,然后是一个完整的单词get
,或者紧挨当前位置右侧的字符串结尾。 See the Python demo :请参阅Python 演示:
import re
variable = 'get the ONE TWO THREE get FOUR FIVE'
myVariableSearch = re.search(r'\bget\s+the\s+(.*?)(?=\s*\bget\b|$)', variable)
mySearchGroup = ''
if myVariableSearch:
mySearchGroup = myVariableSearch.group(1)
print(mySearchGroup)
# => ONE TWO THREE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.