[英]Use regex to extract characters after a substring in python
I have a string that looks something like this -我有一个看起来像这样的字符串 -
text = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'
I need to extract the words after a particular sub-string, in this case, 'sky is'.我需要在特定子字符串之后提取单词,在这种情况下,“天空是”。 That is, I want a list that gives me this -
也就是说,我想要一个给我这个的列表 -
['blue', 'orange']
I have tried the following -我已经尝试了以下 -
p1 =re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)
But this gives the output only as但这仅给出了 output
['blue']
If, however, my text is但是,如果我的文字是
text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange or yellow.'
and I run我跑
p1 = re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)
I get the output as -我得到 output 作为 -
['blue', 'orange']
Please help!请帮忙! I am new to regular expressions and I am stuck!
我是正则表达式的新手,我被卡住了!
It's not a very general solution, but it works for your string.这不是一个非常通用的解决方案,但它适用于您的字符串。
my_str = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'
r = re.compile('sky is [a-z]+')
out = [x.split()[-1] for x in r.findall(my_str)]
In you regex pattern, you only capture the string that is followed by a blank space, however 'orange' is followed by a dot '.', that's why it is not captured.在你的正则表达式模式中,你只捕获后面跟着空格的字符串,但是'orange'后面跟着一个点'.',这就是它没有被捕获的原因。
You have to include the dot '.'你必须包括点“。” in your pattern.
在你的模式中。
p1 = re.compile(r"is (.+?)[ \.]", re.I)
re.findall(p1,text)
# ['blue', 'orange']
Demo:演示:
https://regex101.com/r/B8jhdF/2 https://regex101.com/r/B8jhdF/2
EDIT:编辑:
If the word is at the end of the sentence and not followed by a dot '.', I suggest this:如果单词在句末并且后面没有点“.”,我建议这样做:
text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange'
p1 = re.compile(r"is (.+?)([ \.]|$)")
found_patterns = re.findall(p1,text)
[elt[0] for elt in found_patterns]
# ['blue', 'orange']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.