python regular expression substitute

Question

I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.

score:0.86|taxid:9606(Human)|intact:EBI-999900

How to write regular expression for this in python.

Answer 1

>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'

If there are multiple taxids, use re.findall , which returns a list of all matches:

>>> re.findall(r'taxid:(\d+)', s)
['9606']

Answer 2

for line in lines:
    match = re.match(".*\|taxid:([^|]+)\|.*",line)
    print match.groups()

python regular expression substitute

Question

2 answers

solution1
4 ACCPTED 2012-09-17 20:18:20

solution2
0 2012-09-17 20:20:00

python regular expression substitute

Question

2 answers

solution1 4 ACCPTED 2012-09-17 20:18:20

solution2 0 2012-09-17 20:20:00

solution1
4 ACCPTED 2012-09-17 20:18:20

solution2
0 2012-09-17 20:20:00