Exracting specific patterns using regex

Question

I have something like this,

tr|F2EF46|F2EF46_HORVD  210753
sp|K7W3E0|K7W3E0_MAIZE  21032

I need to print in a separate file only ID's inside | |,

F2EF46
K7W3E0

This script finds the pattern, but how to print only the ID's?

import re
o=open('result.txt','w')
with open('input.txt','rb') as f:
    for line in f:
        if re.findall(r'([a-z][a-z])(\|[a-z0-9]*.*)\|', line):
            line = line.strip()
            line = line.rstrip()
            line = re.sub('(\|[a-z0-9]*.*)\|', '', line) 
            line = re.sub('\|', '', line)
            query_id = line
            f.write(query_id+'\n')
            o.write(line)

Answer 1

You don't need regular expressions here:

id = line.split('|')[1])

Although if you really want to use regexes then you could do:

id = re.search('(\|)(.*?)(\|)', line).group(2)

Only don't use id as a variable name, it is a built-in function and you are overriding it.

Answer 2

If you still want to use regex, use lookarounds :

(?<=\|)[^|]+(?=\|)

Demo

Exracting specific patterns using regex

Question

2 answers

solution1
1 ACCPTED 2014-05-21 16:28:11

solution2
1 2014-05-21 16:30:29

Exracting specific patterns using regex

Question

2 answers

solution1 1 ACCPTED 2014-05-21 16:28:11

solution2 1 2014-05-21 16:30:29

solution1
1 ACCPTED 2014-05-21 16:28:11

solution2
1 2014-05-21 16:30:29