简体   繁体   中英

Python/Regex - match char between two chars, with anything before or after the matching char

I'm trying to match a char within a subset of chars, where either side of the matching char could be anything.

heres an example:

{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}

against the above, I want to match anything between {{ and }} that has a dash "-" in it.

my regex pattern thus far is:

(?<={{)(.*?-.*?)(?=}})

but this is creating a match for the whole test string returning:

SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS

Is anyone able to see what I'm missing? I understand why my regex doesn't work as expected but not how to fix it.

Thanks

You may use this regex with a negative lookahead and a capture group:

({{(?:(?!{{|}})[^-])*)-(.*?}})

RegEx Demo

RegEx Details:

  • ( : Start capture group
    • {{ : Match {{
    • (?: : Start non-capture group
      • (?{{|!}}) : Negative lookahead to assert that we don't have {{ and }} at next position
      • [^-] : Match any character except hyphen
    • )* : End non-capture group. * matches 0+ instances of this group
  • ) : End capture group
  • - : Match literal hyphen
  • (.*?}}) : Match remaining string up to }} and then match }} and capture this in 2nd capture group

Use

import re
s = '{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}'
print([x.strip() for x in re.findall(r'{{(.*?)}}', s) if '-' in x])
// -> ['BGP-AS']

See the Python demo

Details

  • Extract all matches between {{...}} with a mere {{(.*?)}} regex (note that re.findall will only return the captured substing, the value matched with (.*?) )
  • Only keep the matches with - in them using a condition inside list comprehension ( if '-' in x )
  • Remove trailing/leading whitespace with .strip()

A single regex approach (note it might turn out less efficient):

re.findall(r'{{\s*((?:(?!{{|}})[^-])*-.*?)\s*}}', s)

See the Python demo

Details

  • {{ - {{
  • \\s* - 0+ whitespaces
  • ((?:(?!{{|}})[^-])*-.*?) - Capturing group 1 (what will be returned by re.findall ):
    • (?:(?!{{|}})[^-])* - a tempered greedy token matching any non-hyphen char, 0+ times, that does not start a {{ and }} substrings
    • - - a hyphen
    • .*? - any 0+ chars (other than an LF), as few as possible
  • \\s* - 0+ whitespaces
  • }} - }} .

See the regex demo

You can use this pattern: {{(.*?)}} .

  • .*? matches any stream of character non-greedily.

  • (...) creates a capturing group so re.findall yields the inside of the brackets.

To check if the match contains a '-' , it might be simpler to then use in .

Code

import re

def tokenize(s):
    return [w.strip() for w in re.findall('{{(.*?)}}', s) if '-' in w]

print(tokenize('{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}'))

Output

['BGP-AS']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM