简体   繁体   中英

Exracting specific patterns using regex

I have something like this,

tr|F2EF46|F2EF46_HORVD  210753
sp|K7W3E0|K7W3E0_MAIZE  21032

I need to print in a separate file only ID's inside | |,

F2EF46
K7W3E0

This script finds the pattern, but how to print only the ID's?

import re
o=open('result.txt','w')
with open('input.txt','rb') as f:
    for line in f:
        if re.findall(r'([a-z][a-z])(\|[a-z0-9]*.*)\|', line):
            line = line.strip()
            line = line.rstrip()
            line = re.sub('(\|[a-z0-9]*.*)\|', '', line) 
            line = re.sub('\|', '', line)
            query_id = line
            f.write(query_id+'\n')
            o.write(line)

You don't need regular expressions here:

id = line.split('|')[1])

Although if you really want to use regexes then you could do:

id = re.search('(\|)(.*?)(\|)', line).group(2)

Only don't use id as a variable name, it is a built-in function and you are overriding it.

If you still want to use regex, use lookarounds :

(?<=\|)[^|]+(?=\|)

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM