简体   繁体   中英

Search text for a pattern with interruptions to the pattern

I need a flexible way to search for a pattern in a string.

Say our pattern is 'GEEGG'.

I want to determine a string has this pattern, allowing 'interrupting' and 'flanking' symbols.

• Interrupting symbol for 'GEEGG' = 'GEGEGG' or 'GEEGEG' • Flanking symbol for 'GEEGG' = 'GGEEGG' or 'GEEGGE'

I cannot thing of a simple/elegant way to approach this problem.

All of the following queries should match the pattern

pattern = 'GEEGG'
query_flank = '--GEEGG--'
query_flank2 = '--GE--GEEGG--'
query_interrupt = '--G-E-E-G-G-'
query_interrupt2 = '--G-E-G-E-E-G-G'

Python REGEX library could try the following with '* asterisk' or '.* period asterisk' to match anything in between:


import re
txt = "<to search>"
x = re.search("*G*E*E*G*G*", txt)

*** (updated answer below after rici comment)

import re

pattern = 'GEEGG'
query_flank = '--GEEGG--'
query_flank2 = '--GE--GEEGG--'
query_interrupt = '--G-E-E-G-G-'
query_interrupt2 = '--G-E-G-E-E-G-G'


txt = "--GEEGG--"
x = re.search("G*E*E*G*G", txt)
print("print x")
print(x)

import re


pattern = 'GEEGG'
query_flank = '--GEEGG--'
query_flank2 = '--GE--GEEGG--'
query_interrupt = '--G-E-E-G-G-'
query_interrupt2 = '--G-E-G-E-E-G-G'


txt = "--GEEGG--"
y = re.search("G.*E.*E.*G.*G*", txt)
print("print y")
print(y)

OUTPUT:
print x
<re.Match object; span=(2, 7), match='GEEGG'>
print y
<re.Match object; span=(2, 9), match='GEEGG--'>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM