简体   繁体   English

使用正则表达式在 python 中的 substring 之后提取字符

[英]Use regex to extract characters after a substring in python

I have a string that looks something like this -我有一个看起来像这样的字符串 -

text = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'

I need to extract the words after a particular sub-string, in this case, 'sky is'.我需要在特定子字符串之后提取单词,在这种情况下,“天空是”。 That is, I want a list that gives me this -也就是说,我想要一个给我这个的列表 -

['blue', 'orange']

I have tried the following -我已经尝试了以下 -

p1 =re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)

But this gives the output only as但这仅给出了 output

['blue']

If, however, my text is但是,如果我的文字是

text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange or yellow.'

and I run我跑

p1 = re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)

I get the output as -我得到 output 作为 -

['blue', 'orange']

Please help!请帮忙! I am new to regular expressions and I am stuck!我是正则表达式的新手,我被卡住了!

It's not a very general solution, but it works for your string.这不是一个非常通用的解决方案,但它适用于您的字符串。

my_str = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'
r = re.compile('sky is [a-z]+')
out = [x.split()[-1] for x in r.findall(my_str)]

In you regex pattern, you only capture the string that is followed by a blank space, however 'orange' is followed by a dot '.', that's why it is not captured.在你的正则表达式模式中,你只捕获后面跟着空格的字符串,但是'orange'后面跟着一个点'.',这就是它没有被捕获的原因。
You have to include the dot '.'你必须包括点“。” in your pattern.在你的模式中。

p1 = re.compile(r"is (.+?)[ \.]", re.I)
re.findall(p1,text)
# ['blue', 'orange']

Demo:演示:
https://regex101.com/r/B8jhdF/2 https://regex101.com/r/B8jhdF/2

EDIT:编辑:
If the word is at the end of the sentence and not followed by a dot '.', I suggest this:如果单词在句末并且后面没有点“.”,我建议这样做:

text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange'
p1 = re.compile(r"is (.+?)([ \.]|$)")
found_patterns = re.findall(p1,text)
[elt[0] for elt in found_patterns]
# ['blue', 'orange']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM