bounding strings between two characters in regex

Question

I am using <[^<>]+> in order to extract substrings between < and > , as the following:

<abc>, <?.sdfs/>, <sdsld\\> , etc.

I am not trying to parse HTML tags, or something similar. My only issue is extracting strings between < and > .

But sometimes, there might be substrings like the following:

</</\/\asa></dsdsds><sdsfsa>>

In that case, all string should be matched, instead of 3 substrings. Because all string is covered by < and > .

How can I modify my regex to do that?

Answer 1

Don't use regex. Use the traditional way to do this. Make a stack and if there's more than one '<' keep appending else break and append the whole thing.

But just make sure to handle the double back slashes that somehow crop up :-/

def find_tags(your_string)
    ans = []
    stack = []
    tag_no = 0

    for c in your_string:
        if c=='<':
            tag_no+=1
            if tag_no>1:
                stack.append(c)
        elif c=='>':
            if tag_no==1:
                ans.append(''.join(stack))
                tag_no=0
                stack=[]
             else:
                  tag_no = tag_no-1
                  stack.append(c)
        elif tag_no>0:
             stack.append(c)
    return ans

Output below

find_tags(r'<abc>, <?.sdfs/>, <sdsld\>')
['abc', '?.sdfs/', 'sdsld\\']
find_tags(r'</</\/\asa></dsdsds><sdsfsa>>')
['/</\\/\\asa></dsdsds><sdsfsa>']

Note: Works in O(n) as well.

Answer 2

Refer this Regular Expression to match outer brackets I'm trying to implement the same using < & > .

Or How about a small method for this:

def recursive_bracket_parser(s, i):
while i < len(s):
    if s[i] == '<':
        i = recursive_bracket_parser(s, i+1)
    elif s[i] == '>':
        return i+1
    else:
        # process whatever is at s[i]
        i += 1
return i

Source: How can I match nested brackets using regex?

bounding strings between two characters in regex

Question

2 answers

solution1
1 ACCPTED 2017-03-08 09:23:25

solution2
1 2017-03-08 09:25:22

bounding strings between two characters in regex

Question

2 answers

solution1 1 ACCPTED 2017-03-08 09:23:25

solution2 1 2017-03-08 09:25:22

solution1
1 ACCPTED 2017-03-08 09:23:25

solution2
1 2017-03-08 09:25:22