[英]Extract substring from a line in python, if another keyword is found
I am trying to work with regex in python to extract a small substring from a large string, if another keyword is found in the string. 我正在尝试使用python中的regex从大字符串中提取一个小子字符串,如果在字符串中找到另一个关键字。
eg - 例如 -
s = "1 0001 1 UG science,ee;YEAR=onefour;standard->2;district->9"
if "year" in s:
print ("The year is = ",VALUE_OF_YEAR)<--- here I hope to somehow get the year substring from the above string and print it.
ie the answer will look like 即答案看起来像
The year is = onefour
Please note - the value will change if its denoting a different number like onethree, oneseven, etc 请注意 - 如果值表示不同的数字,例如onethree,oneseven等,则值将会改变
I basically want to copy whatever starts from after 我基本上想要复制从后面开始的任何内容
=
till the 直到
;
if I find 如果我找到
YEAR
in the string and print it out 在字符串中打印出来
I am not too sure how to do this. 我不太清楚如何做到这一点。
I tried using string manipulation methods in python, but so far I haven't found any way to precisely copy off all the words till the ';' 我尝试在python中使用字符串操作方法,但到目前为止,我还没有找到任何方法来精确复制所有单词,直到';' in the string. 在字符串中。
Any help will be appreciated. 任何帮助将不胜感激。 Any other method is also welcome. 任何其他方法也欢迎。
You can also have a saving group capture the year
value: 您还可以使用保存组捕获year
值:
>>> import re
>>>
>>> pattern = re.compile(r"YEAR=(\w+);")
>>> s = "1 0001 1 UG science,ee;YEAR=onefour;standard->2;district->9"
>>> pattern.search(s).group(1)
'onefour'
You may also need to handle cases when there is no match. 您可能还需要在没有匹配时处理案例。 For example, return None
: 例如,返回None
:
import re
def get_year_value(s):
pattern = re.compile(r"YEAR=(\w+);")
match = pattern.search(s)
return match.group(1) if match else None
You can use a regex to grab that value: 您可以使用正则表达式来获取该值:
(?<=\bYEAR=)[^;]+
The regex matches: 正则表达式匹配:
(?<=\\bYEAR=)
If the string we are looking for is preceded with a whole word YEAR=
... (?<=\\bYEAR=)
如果我们要查找的字符串前面有一个完整的单词YEAR=
... [^;]+
- match 1 or more characters other than ;
[^;]+
- 匹配除1之外的1个或多个字符;
. 。 Here is a regex demo 这是一个正则表达式演示
Here is sample Python code : 以下是Python代码示例 :
import re
p = re.compile(r'(?<=\bYEAR=)[^;]+')
test_str = "1 0001 1 UG science,ee;YEAR=onefour;standard->2;district->9"
robj = re.search(p, test_str)
if robj:
print(robj.group(0))
If everyone is so fond of capturing groups, here is the same expression with the lookbehind replaced with a capturing group: 如果每个人都非常喜欢捕捉群组,那么这里的表情背后被一个捕捉群所取代:
\bYEAR=([^;]+)
And in Python: 在Python中:
p = re.compile(r'\bYEAR=([^;]+)')
test_str = "1 0001 1 UG science,ee;YEAR=onefour;standard->2;district->9"
robj = re.search(p, test_str)
if robj:
print(robj.group(1))
Note that in case your YEAR
value has hyphens or other non-word characters in it, \\w
will not help you. 请注意,如果您的YEAR
值中包含连字符或其他非单词字符, \\w
将无法帮助您。 The negated character class is your best friend here. 被否定的角色类是你最好的朋友。
This is what I use, 这是我用的,
if "YEAR" in s:
year= s.split('YEAR=')[1].split(';')[0]
print ("The year is = " +year)
#this is the output
> The year is = onefour
Basically what it is doing is splitting the line after YEAR=
and before ;
基本上它正在做的是在YEAR=
之后分割线;
. 。 The [1]
splits the right of the sub string YEAR=
and the [0]
splits the left of the sub string ;
[1]
分割子字符串YEAR=
的右边, [0]
分割子字符串的左边;
YEAR=(?P<year>\w+);
这应该工作。
Try this regex: 试试这个正则表达式:
".*(?=YEAR).*YEAR=(.*?);.*"g
with substitution /1
替换/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.