Python 正则表达式搜索直到特定单词并排除其后面的所有内容

Question

I have a script that always have the "get the" and the "get" in a string.我有一个总是在字符串中包含“get the”和“get”的脚本。 The "ONE TWO THREE" can vary, like it also can be "THIRTEEN FORTY" or "SIX" . “一二三”可以变化，就像它也可以是“十三四十”或“六” 。 After these variations there will always be a 2nd "get" .在这些变化之后，总会有第二个“get” 。

I have the following code:我有以下代码：

variable = 'get the ONE TWO THREE get FOUR FIVE'

myVariable = re.compile(r'(?<=get the) .*')
myVariableSearch = myVariable.search(variable)
mySearchGroup = myVariableSearch.group()
print(mySearchGroup) 

#prints ONE TWO THREE get FOUR FIVE

I want my script to exclude the 2nd "get" and everything behind it.我希望我的脚本排除第二个“get”及其后面的所有内容。 My desired result is to be just the "ONE TWO THREE" .我想要的结果就是“一二三” 。

How do I exclude this?我该如何排除这个？ Any help would be appreciated!任何帮助，将不胜感激！

Answer 1

You can use您可以使用

\bget\s+the\s+(.*?)(?=\s*\bget\b|$)

See the regex demo .请参阅正则表达式演示。

Details细节

\bget\s+the\s+ - whole word get , 1+ whitespaces, the , 1+ whitespaces \bget\s+the\s+ - 整个单词get , 1+ 个空格, the , 1+ 个空格
(.*?) - Group 1: (.*?) - 第 1 组：
(?=\s*\bget\b|$) - a positive lookahead that requires 0+ whitespaces and then a whole word get , or end of string immediately on the right of the current location. (?=\s*\bget\b|$) - 一个正向前瞻，需要 0+ 个空格，然后是一个完整的单词get ，或者紧挨当前位置右侧的字符串结尾。

See the Python demo :请参阅Python 演示：

import re
variable = 'get the ONE TWO THREE get FOUR FIVE'
myVariableSearch = re.search(r'\bget\s+the\s+(.*?)(?=\s*\bget\b|$)', variable)
mySearchGroup = ''
if myVariableSearch:
    mySearchGroup = myVariableSearch.group(1)
print(mySearchGroup) 
# => ONE TWO THREE

Python 正则表达式搜索直到特定单词并排除其后面的所有内容

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-14 11:32:46

Python 正则表达式搜索直到特定单词并排除其后面的所有内容

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-14 11:32:46

解决方案1
0 已采纳 2020-08-14 11:32:46