简体   繁体   中英

python regex for grabbing specific parts of a line

Want to go through lines in a file and grab certain parts of them.

Lines look like the below "2584\\tM108\\tK:14%" "2585\\tM108\\tK:14%\\tN:10%"

I have written the following expressions but they seem to be failing me... Firstly I am looking to grab the M10* and the K, and stick them together, taking only the first entry after the M10* (in the above example K).

Mutation = re.sub(r'.*\t(.*)\t.*:(.*)%.*', r'\1\2', line)

I want Mutation = M108K

Secondly I want to grab the percentage without the % symbol

Percentage = re.sub(r'.*\t.*\t.*:(.*)%.*', r'\1', line)

I want Percentage = 14

Not very practiced are writing expressions, these don't really work and are inefficient. Any help fixing/optimising them is appreciated.

I would do all these in a single regex. .* is greedy which eats all the characters as much as possible. So you need to make it to do a non-greedy match by adding ? quantifier next to * .

>>> import re
>>> s = "2584\tM108\tK:14%" "2585\tM108\tK:14%\tN:10%"
>>> re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s)
'M108K 14'

or

>>> mutation,percentage = re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s).split()
>>> mutation
'M108K'
>>> percentage
'14'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM