简体   繁体   English

python正则表达式模式re.search

[英]python regular expression pattern re.search

hello i am confused with python regular expression,here is my code: 您好,我对python正则表达式感到困惑,这是我的代码:

import os,re,sys

t="LOC_Os01g01010.1 GO:0030234  F   enzyme regulator activity   IEA     TAIR:AT3G59570"
k =['LOC_Os01g01010']

re_search=re.search(re.escape(k[0] + r'.1   GO:\d{7}'),t,re.M|re.I|re.S)
if re_search is None:
      pass
else:
      print re_search.group()

"t" is my data and "k" is my goal. “ t”是我的数据,“ k”是我的目标。

What i want is "LOC_Os01g01010.1 GO:0030234" or "GO:0030234",but i don't know how to write the pattern. 我想要的是“ LOC_Os01g01010.1 GO:0030234”或“ GO:0030234”,但我不知道如何编写模式。

Given your example and the expectation that in LOC_********.* the stars can be anything in the set [a-zA-Z0-9] I would suggest: 给定您的示例并期望在LOC_********.* ,星星可以是集合[a-zA-Z0-9]中的任何东西,我建议:

import os,re,sys

t="LOC_Os01g01010.1 GO:0030234  F   enzyme regulator activity   IEA      TAIR:AT3G59570"
k =['LOC_Os01g01010']

re_search=re.search("(LOC_[0-9A-Z]*)",t,re.M|re.I|re.S)
if re_search is None:
      pass
else:
      print re_search.group()

python regexthing.py yields LOC_Os01g01010 when I run it with python2.7. 当我使用python2.7运行python regexthing.py时,它会生成LOC_Os01g01010 The (LOC_[0-9A-Za-z]*) is a capture group that captures the content of anything matching the expression LOC_[0-9A-Z]* . (LOC_[0-9A-Za-z]*)是一个捕获组,捕获与表达式LOC_[0-9A-Z]*匹配的任何内容。 This expression will match LOC_ , LOC_ABCabc123 , LOC_a1B2C , etc. 此表达式将匹配LOC_LOC_ABCabc123LOC_a1B2C等。

I hope this answers your question. 我希望这回答了你的问题。

I believe the following would solve your problem: 我相信以下可以解决您的问题:

import re
t="LOC_Os01g01010.1 GO:0030234  F   enzyme regulator activity   IEA     TAIR:AT3G59570"
my_regex = re.compile(r'^LOC_(.)*GO:\d{7}',re.M|re.I|re.S)
searches = my_regex.search(t)
if searches:
    print searches.group()

If there's any solution at all, there is (provably) infinite solutions to a regex that can match a finite set of examples in an unbounded string. 如果有任何解决方案,那么(可证明)正则表达式有无限的解决方案,可以与无限制字符串中的有限示例集合匹配。

That's a subsuming way to say that you need to be more specific since giving us only one example of what you're trying to match, we'll be able to produce multiple solutions for you, depending on what further (unspecified) assumptions we add ourselves. 这是一种表达方式,您需要更加具体,因为只给我们一个您要匹配的示例,我们将根据您添加的其他(未指定)假设为您提供多种解决方案我们自己。

Here are a few, with stated assumptions: 这里有一些假设的假设:

>>> import re
>>> t = "LOC_Os01g01010.1 GO:0030234  F   enzyme regulator activity   IEA     TAIR:AT3G59570"
>>> re.findall('\w+\.\d+', t) # any alphnumeric sequence, followed by dot and digits
['LOC_Os01g01010.1']
>>> re.findall('[A-Z]+_\w+\.\d+', t) # forcing token to start with capitals and underscore
['LOC_Os01g01010.1']
>>> re.findall('[A-Z]+_O[a-z01]+\.\d+', t) # forcing "O", and middle part to be only small letters and 0s and 1s
['LOC_Os01g01010.1']
>>> re.findall('^[A-Z]+_O[a-z01]+\.\d+', t) # forcing the pattern to be at the beginning of the string
['LOC_Os01g01010.1']```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM