Search text for a pattern with interruptions to the pattern

Question

I need a flexible way to search for a pattern in a string.

Say our pattern is 'GEEGG'.

I want to determine a string has this pattern, allowing 'interrupting' and 'flanking' symbols.

• Interrupting symbol for 'GEEGG' = 'GEGEGG' or 'GEEGEG' • Flanking symbol for 'GEEGG' = 'GGEEGG' or 'GEEGGE'

I cannot thing of a simple/elegant way to approach this problem.

All of the following queries should match the pattern

pattern = 'GEEGG'
query_flank = '--GEEGG--'
query_flank2 = '--GE--GEEGG--'
query_interrupt = '--G-E-E-G-G-'
query_interrupt2 = '--G-E-G-E-E-G-G'

Answer 1

Python REGEX library could try the following with '* asterisk' or '.* period asterisk' to match anything in between:

import re
txt = "<to search>"
x = re.search("*G*E*E*G*G*", txt)

*** (updated answer below after rici comment)

import re

pattern = 'GEEGG'
query_flank = '--GEEGG--'
query_flank2 = '--GE--GEEGG--'
query_interrupt = '--G-E-E-G-G-'
query_interrupt2 = '--G-E-G-E-E-G-G'


txt = "--GEEGG--"
x = re.search("G*E*E*G*G", txt)
print("print x")
print(x)

import re


pattern = 'GEEGG'
query_flank = '--GEEGG--'
query_flank2 = '--GE--GEEGG--'
query_interrupt = '--G-E-E-G-G-'
query_interrupt2 = '--G-E-G-E-E-G-G'


txt = "--GEEGG--"
y = re.search("G.*E.*E.*G.*G*", txt)
print("print y")
print(y)

OUTPUT:
print x
<re.Match object; span=(2, 7), match='GEEGG'>
print y
<re.Match object; span=(2, 9), match='GEEGG--'>

Search text for a pattern with interruptions to the pattern

Question

1 answers

solution1
1 ACCPTED 2019-10-03 16:36:22

Search text for a pattern with interruptions to the pattern

Question

1 answers

solution1 1 ACCPTED 2019-10-03 16:36:22

solution1
1 ACCPTED 2019-10-03 16:36:22