简体   繁体   English

如果找到另一个关键字,则从python中的一行中提取子字符串

[英]Extract substring from a line in python, if another keyword is found

I am trying to work with regex in python to extract a small substring from a large string, if another keyword is found in the string. 我正在尝试使用python中的regex从大字符串中提取一个小子字符串,如果在字符串中找到另一个关键字。

eg - 例如 -

s = "1  0001    1   UG  science,ee;YEAR=onefour;standard->2;district->9"

if "year" in s:
    print ("The year is = ",VALUE_OF_YEAR)<--- here I hope to somehow get the year substring from the above string and print it.

ie the answer will look like 即答案看起来像

The year is = onefour  

Please note - the value will change if its denoting a different number like onethree, oneseven, etc 请注意 - 如果值表示不同的数字,例如onethree,oneseven等,则值将会改变

I basically want to copy whatever starts from after 我基本上想要复制从后面开始的任何内容

= 

till the 直到

;

if I find 如果我找到

YEAR

in the string and print it out 在字符串中打印出来

I am not too sure how to do this. 我不太清楚如何做到这一点。

I tried using string manipulation methods in python, but so far I haven't found any way to precisely copy off all the words till the ';' 我尝试在python中使用字符串操作方法,但到目前为止,我还没有找到任何方法来精确复制所有单词,直到';' in the string. 在字符串中。

Any help will be appreciated. 任何帮助将不胜感激。 Any other method is also welcome. 任何其他方法也欢迎。

You can also have a saving group capture the year value: 您还可以使用保存组捕获year值:

>>> import re
>>> 
>>> pattern = re.compile(r"YEAR=(\w+);")
>>> s = "1  0001    1   UG  science,ee;YEAR=onefour;standard->2;district->9"
>>> pattern.search(s).group(1)
'onefour'

You may also need to handle cases when there is no match. 您可能还需要在没有匹配时处理案例。 For example, return None : 例如,返回None

import re

def get_year_value(s):
    pattern = re.compile(r"YEAR=(\w+);")
    match = pattern.search(s)

    return match.group(1) if match else None

You can use a regex to grab that value: 您可以使用正则表达式来获取该值:

(?<=\bYEAR=)[^;]+

The regex matches: 正则表达式匹配:

  • (?<=\\bYEAR=) If the string we are looking for is preceded with a whole word YEAR= ... (?<=\\bYEAR=)如果我们要查找的字符串前面有一个完整的单词YEAR= ...
  • [^;]+ - match 1 or more characters other than ; [^;]+ - 匹配除1之外的1个或多个字符; .

Here is a regex demo 这是一个正则表达式演示

Here is sample Python code : 以下是Python代码示例

import re
p = re.compile(r'(?<=\bYEAR=)[^;]+')
test_str = "1  0001    1   UG  science,ee;YEAR=onefour;standard->2;district->9"
robj = re.search(p, test_str)
if robj:
    print(robj.group(0))

If everyone is so fond of capturing groups, here is the same expression with the lookbehind replaced with a capturing group: 如果每个人都非常喜欢捕捉群组,那么这里的表情背后被一个捕捉群所取代:

\bYEAR=([^;]+)

And in Python: 在Python中:

p = re.compile(r'\bYEAR=([^;]+)')
test_str = "1  0001    1   UG  science,ee;YEAR=onefour;standard->2;district->9"
robj = re.search(p, test_str)
if robj:
    print(robj.group(1))

Note that in case your YEAR value has hyphens or other non-word characters in it, \\w will not help you. 请注意,如果您的YEAR值中包含连字符或其他非单词字符, \\w将无法帮助您。 The negated character class is your best friend here. 被否定的角色类是你最好的朋友。

This is what I use, 这是我用的,

if "YEAR" in s:
    year= s.split('YEAR=')[1].split(';')[0]
    print ("The year is = " +year)
#this is the output
> The year is = onefour 

Basically what it is doing is splitting the line after YEAR= and before ; 基本上它正在做的是在YEAR=之后分割线; . The [1] splits the right of the sub string YEAR= and the [0] splits the left of the sub string ; [1]分割子字符串YEAR=的右边, [0]分割子字符串的左边;

YEAR=(?P<year>\w+);

这应该工作。

Try this regex: 试试这个正则表达式:

".*(?=YEAR).*YEAR=(.*?);.*"g

with substitution /1 替换/1

[Regex Demo] [正则表达式演示]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM