简体   繁体   English

如何使用 Python 中的正则表达式捕获从字符串开头到每次出现特定字符串/模式的所有内容?

[英]How to capture everything from the beginning of a string until every occurrence of a specific string/pattern using regular expressions in Python?

How can one capture everything from the beginning of a string until every occurrence of a specific string/pattern using regular expressions in Python?如何使用 Python 中的正则表达式捕获从字符串开头到特定字符串/模式每次出现的所有内容?

So, for example, if I have a string like the following, and I want to catch everything until every occurrence of `"UNTIL":因此,例如,如果我有一个如下所示的字符串,并且我想捕获所有内容,直到每次出现“UNTIL”:

txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."

Then the outputs are supposed to be as the follows:那么输出应该如下所示:

[
  "Here's some text ",
  "Here's some text UNTIL for the 1st time, then some more text ",
  "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text ",
]

What I could figure out already is this:我已经可以弄清楚的是:

import re

re.findall(r'.+?(?=UNTIL)', txt)
# Output
[
  "Here's some text ",
  "UNTIL for the 1st time, then some more text ",
  "UNTIL for the 2nd time, and finally more text ",
]

But the result is not exactly what I need to achieve.但结果并不完全是我需要达到的。 I know I could solve this programmatically, but I am working with relatively large files, so I would be glad to solve it with only regular expressions.我知道我可以通过编程方式解决这个问题,但我正在处理相对较大的文件,所以我很乐意只用正则表达式来解决它。

Is there a way to achieve this?有没有办法做到这一点? And if so, how?如果是这样,怎么办?

Solution 1解决方案 1

The regex you're looking for is (?:\b|^)(?=UNTIL(?=.*UNTIL))您正在寻找的正则表达式是(?:\b|^)(?=UNTIL(?=.*UNTIL))

import re

txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."

res = re.split(r"(?:\b|^)(?=UNTIL(?=.*UNTIL))", txt)

Solution 2解决方案 2

The best thing you could do here with .+?(?=UNTIL) is to convert the result of re.findall(r'.+?(?=UNTIL)', txt) to the expected format.您可以在这里使用.+?(?=UNTIL)做的最好的事情是将re.findall(r'.+?(?=UNTIL)', txt)的结果转换为预期的格式。

import re

txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."

arr = re.findall(r'.+?(?=UNTIL)', txt)
res = [''.join(arr[:i+1]) for i in range(len(arr))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python使用正则表达式捕获字符串中的特定模式 - Python capture a specific pattern inside a string with regular expressions 如何使用正则表达式从python中删除字符串中的标签? (不是HTML) - How to remove tags from a string in python using regular expressions? (NOT in HTML) Python-如何使用正则表达式拆分字符串? - Python - How to split a string using regular expressions? Python:如何计算列表或字符串中重叠的特定模式的发生? - Python: how to count the occurrence of specific pattern with overlap in a list or string? 使用正则表达式从字符串列表中提取特定信息 - Extracting specific information from a string list using regular expressions 使用正则表达式从字符串中获取序列 #Python #Regex - Get a secuence from a string using regular expressions #Python #Regex 使用 python 中的正则表达式从字符串中检索部分 - Retrieve sections from string using regular expressions in python 如何使用 Python 中的正则表达式在字符串中查找重复的 substring? - How to find repeated substring in a string using regular expressions in Python? 如何在python中使用正则表达式修改字符串中的文本? - How to modify a text within string using regular expressions in python? 使用正则表达式匹配字符串的一部分?(python) - Using Regular expressions to match a portion of the string?(python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM