[英]Capture string between two words but only 1st time
I have string like: 我有这样的字符串:
text = "Why do Humans need to eat food? Humans eat food to survive."
I want to capture everything between Human
and food
but only 1st time. 我只想第一次捕捉
Human
与food
之间的一切。
Expected Output 预期产量
Humans need to eat food
My Regex: 我的正则表达式:
p =r'(\bHumans?\b.*?\bFoods?\b)'
Python Code: Python代码:
re.findall(p, text, re.I|re.M|re.DOTALL)
The code correctly captures the string between Human and Food but it doesn't stops at 1st capture. 该代码可以正确捕获“人类”和“食物”之间的字符串,但不会在第一次捕获时停止。
Research : 研究方向
I have read that to make it non-greedy I need to put ?
我读过要使它变得非贪婪,我需要输入
?
but I am unable to figure out where I should keep it to make it non-greedy. 但我无法弄清楚应将其保留在什么位置以使其不贪婪。 All other permutation and combination I tried I can't stopped it at 1st match.
我尝试过的所有其他排列组合在第一局都无法停止。
Update 更新资料
I am writing a lot of regexes to capture various other entities like this and parsing them in one shot and hence I can't change my re.findall
logic. 我正在编写很多正则表达式来捕获像这样的各种其他实体,并一次性解析它们,因此我无法更改
re.findall
逻辑。
Use search
instead of findall
: 使用
search
代替findall
:
import re
text = "Why do Humans need to eat food? Humans eat food to survive."
p =r'(\bHumans?\b.*?\bFoods?\b)'
res = re.search(p, text, re.I|re.M|re.DOTALL)
print(res.groups())
Output: 输出:
('Humans need to eat food',)
Or add .*
at the end of the regex: 或在正则表达式的末尾添加
.*
:
import re
text = "Why do Humans need to eat food? Humans eat food to survive."
p =r'(\bHumans?\b.*?\bFoods?\b).*'
# here ___^^
res = re.findall(p, text, re.I|re.M|re.DOTALL)
print(res)
For finding the first match only, Toto's answer is best but as you said you need to use findall
only, you can just append .*
at the end of your regex to match remaining text which won't result in any matches further. 对于仅查找第一个匹配项,Toto的答案是最好的,但是正如您所说的,您只需要使用
findall
,您只需在正则表达式的末尾附加.*
即可匹配其余文本,而不会进一步导致任何匹配项。
(\bHumans?\b.*?\bFoods?\b).*
^^ This eats remaining part of your text due to which there won't be any further matches.
Sample Python codes, 示例Python代码,
import re
text = "Why do Humans need to eat food? Humans eat food to survive."
p =r'(\bHumans?\b.*?\bFoods?\b).*'
print(re.findall(p, text, re.I|re.M|re.DOTALL))
Prints, 印刷品
['Humans need to eat food']
Try this: 尝试这个:
>>> import re
>>> text = "Why do Humans need to eat food? Humans eat food to survive."
>>> re.search(r'Humans.*?food', text).group() # you want the all powerful non-greedy '?' :)
'Humans need to eat food'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.