I'm trying to match a char within a subset of chars, where either side of the matching char could be anything.
heres an example:
{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}
against the above, I want to match anything between {{ and }} that has a dash "-" in it.
my regex pattern thus far is:
(?<={{)(.*?-.*?)(?=}})
but this is creating a match for the whole test string returning:
SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS
Is anyone able to see what I'm missing? I understand why my regex doesn't work as expected but not how to fix it.
Thanks
You may use this regex with a negative lookahead and a capture group:
({{(?:(?!{{|}})[^-])*)-(.*?}})
RegEx Details:
(
: Start capture group
{{
: Match {{
(?:
: Start non-capture group
(?{{|!}})
: Negative lookahead to assert that we don't have {{
and }}
at next position [^-]
: Match any character except hyphen )*
: End non-capture group. *
matches 0+ instances of this group )
: End capture group -
: Match literal hyphen (.*?}})
: Match remaining string up to }}
and then match }}
and capture this in 2nd capture group Use
import re
s = '{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}'
print([x.strip() for x in re.findall(r'{{(.*?)}}', s) if '-' in x])
// -> ['BGP-AS']
See the Python demo
Details
{{...}}
with a mere {{(.*?)}}
regex (note that re.findall
will only return the captured substing, the value matched with (.*?)
) -
in them using a condition inside list comprehension ( if '-' in x
) .strip()
A single regex approach (note it might turn out less efficient):
re.findall(r'{{\s*((?:(?!{{|}})[^-])*-.*?)\s*}}', s)
See the Python demo
Details
{{
- {{
\\s*
- 0+ whitespaces ((?:(?!{{|}})[^-])*-.*?)
- Capturing group 1 (what will be returned by re.findall
):
(?:(?!{{|}})[^-])*
- a tempered greedy token matching any non-hyphen char, 0+ times, that does not start a {{
and }}
substrings -
- a hyphen .*?
- any 0+ chars (other than an LF), as few as possible \\s*
- 0+ whitespaces }}
- }}
. See the regex demo
You can use this pattern: {{(.*?)}}
.
.*?
matches any stream of character non-greedily.
(...)
creates a capturing group so re.findall
yields the inside of the brackets.
To check if the match contains a '-'
, it might be simpler to then use in
.
import re
def tokenize(s):
return [w.strip() for w in re.findall('{{(.*?)}}', s) if '-' in w]
print(tokenize('{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}'))
['BGP-AS']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.