简体   繁体   中英

python regular expression substitute

I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.

score:0.86|taxid:9606(Human)|intact:EBI-999900

How to write regular expression for this in python.

>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'

If there are multiple taxids, use re.findall , which returns a list of all matches:

>>> re.findall(r'taxid:(\d+)', s)
['9606']
for line in lines:
    match = re.match(".*\|taxid:([^|]+)\|.*",line)
    print match.groups()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM