[英]How to capture everything from the beginning of a string until every occurrence of a specific string/pattern using regular expressions in Python?
How can one capture everything from the beginning of a string until every occurrence of a specific string/pattern using regular expressions in Python?如何使用 Python 中的正则表达式捕获从字符串开头到特定字符串/模式每次出现的所有内容?
So, for example, if I have a string like the following, and I want to catch everything until every occurrence of `"UNTIL":因此,例如,如果我有一个如下所示的字符串,并且我想捕获所有内容,直到每次出现“UNTIL”:
txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."
Then the outputs are supposed to be as the follows:那么输出应该如下所示:
[
"Here's some text ",
"Here's some text UNTIL for the 1st time, then some more text ",
"Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text ",
]
What I could figure out already is this:我已经可以弄清楚的是:
import re
re.findall(r'.+?(?=UNTIL)', txt)
# Output
[
"Here's some text ",
"UNTIL for the 1st time, then some more text ",
"UNTIL for the 2nd time, and finally more text ",
]
But the result is not exactly what I need to achieve.但结果并不完全是我需要达到的。 I know I could solve this programmatically, but I am working with relatively large files, so I would be glad to solve it with only regular expressions.
我知道我可以通过编程方式解决这个问题,但我正在处理相对较大的文件,所以我很乐意只用正则表达式来解决它。
Is there a way to achieve this?有没有办法做到这一点? And if so, how?
如果是这样,怎么办?
The regex you're looking for is (?:\b|^)(?=UNTIL(?=.*UNTIL))
您正在寻找的正则表达式是
(?:\b|^)(?=UNTIL(?=.*UNTIL))
import re
txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."
res = re.split(r"(?:\b|^)(?=UNTIL(?=.*UNTIL))", txt)
The best thing you could do here with .+?(?=UNTIL)
is to convert the result of re.findall(r'.+?(?=UNTIL)', txt)
to the expected format.您可以在这里使用
.+?(?=UNTIL)
做的最好的事情是将re.findall(r'.+?(?=UNTIL)', txt)
的结果转换为预期的格式。
import re
txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."
arr = re.findall(r'.+?(?=UNTIL)', txt)
res = [''.join(arr[:i+1]) for i in range(len(arr))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.