python regex for grabbing specific parts of a line

Question

Want to go through lines in a file and grab certain parts of them.

Lines look like the below "2584\\tM108\\tK:14%" "2585\\tM108\\tK:14%\\tN:10%"

I have written the following expressions but they seem to be failing me... Firstly I am looking to grab the M10* and the K, and stick them together, taking only the first entry after the M10* (in the above example K).

Mutation = re.sub(r'.*\t(.*)\t.*:(.*)%.*', r'\1\2', line)

I want Mutation = M108K

Secondly I want to grab the percentage without the % symbol

Percentage = re.sub(r'.*\t.*\t.*:(.*)%.*', r'\1', line)

I want Percentage = 14

Not very practiced are writing expressions, these don't really work and are inefficient. Any help fixing/optimising them is appreciated.

Answer 1

I would do all these in a single regex. .* is greedy which eats all the characters as much as possible. So you need to make it to do a non-greedy match by adding ? quantifier next to * .

>>> import re
>>> s = "2584\tM108\tK:14%" "2585\tM108\tK:14%\tN:10%"
>>> re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s)
'M108K 14'

or

>>> mutation,percentage = re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s).split()
>>> mutation
'M108K'
>>> percentage
'14'

python regex for grabbing specific parts of a line

Question

1 answers

solution1
3 ACCPTED 2015-08-06 18:31:46

python regex for grabbing specific parts of a line

Question

1 answers

solution1 3 ACCPTED 2015-08-06 18:31:46

solution1
3 ACCPTED 2015-08-06 18:31:46