简体   繁体   English

Python 正则表达式搜索直到特定单词并排除其后面的所有内容

[英]Python regex search until specific word and exclude everything behind it

I have a script that always have the "get the" and the "get" in a string.我有一个总是在字符串中包含“get the”“get”的脚本。 The "ONE TWO THREE" can vary, like it also can be "THIRTEEN FORTY" or "SIX" . “一二三”可以变化,就像它也可以是“十三四十”“六” After these variations there will always be a 2nd "get" .在这些变化之后,总会有第二个“get”

I have the following code:我有以下代码:

variable = 'get the ONE TWO THREE get FOUR FIVE'

myVariable = re.compile(r'(?<=get the) .*')
myVariableSearch = myVariable.search(variable)
mySearchGroup = myVariableSearch.group()
print(mySearchGroup) 

#prints ONE TWO THREE get FOUR FIVE

I want my script to exclude the 2nd "get" and everything behind it.我希望我的脚本排除第二个“get”及其后面的所有内容。 My desired result is to be just the "ONE TWO THREE" .我想要的结果就是“一二三”

How do I exclude this?我该如何排除这个? Any help would be appreciated!任何帮助,将不胜感激!

You can use您可以使用

\bget\s+the\s+(.*?)(?=\s*\bget\b|$)

See the regex demo .请参阅正则表达式演示

Details细节

  • \bget\s+the\s+ - whole word get , 1+ whitespaces, the , 1+ whitespaces \bget\s+the\s+ - 整个单词get , 1+ 个空格, the , 1+ 个空格
  • (.*?) - Group 1: (.*?) - 第 1 组:
  • (?=\s*\bget\b|$) - a positive lookahead that requires 0+ whitespaces and then a whole word get , or end of string immediately on the right of the current location. (?=\s*\bget\b|$) - 一个正向前瞻,需要 0+ 个空格,然后是一个完整的单词get ,或者紧挨当前位置右侧的字符串结尾。

See the Python demo :请参阅Python 演示

import re
variable = 'get the ONE TWO THREE get FOUR FIVE'
myVariableSearch = re.search(r'\bget\s+the\s+(.*?)(?=\s*\bget\b|$)', variable)
mySearchGroup = ''
if myVariableSearch:
    mySearchGroup = myVariableSearch.group(1)
print(mySearchGroup) 
# => ONE TWO THREE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM