I am trying to parse a string which is of the following format:
text="some random string <inAngle> <anotherInAngle> [-option text] [-anotherOption <text>] [-option (Y|N)]"
I want to split the string in three parts.
If I use the RegEx
re.findall(r'\[(.+?)\]', text)
It gives everything I need within square brackets. If I use the same RegEx with angle brackets however,
re.findall(r'<(.+?)>', text)
It gives the text which is within angle bracket that are within square brackets too. So for example "text" from above which is within [-anotherOption]. I do not want that. The RegEx for angle bracket match should only return "inAngle" "anotherInAngle" from above. What would be the RegEx for it?
Also how do I get only the first part ie "some random string". This string can have 2 or 3 number of words
You can simply disregard everything between square brackets before searching for things in angle brackets:
interm = re.sub(r'\[(.*?)\]', '', text)
re.findall(r'<(.+?)>', interm)
outputs
['inAngle', 'anotherInAngle']
then for matching the first part, match everything up to [
or <
. Granted this wont work if a string is allowed to randomly have either of these symbols unclosed embedded in the first part:
re.findall(r'([^<\[]+)', text)[0]
outputs
some random string
Try if this regex would capture what you need
\s*([^><[\]]+\b)|\[([^]]*)]|<([^>]*)>
\\s*
preceded by optional whitespace ([^><[\\]]+\\b)
Group 1: Any non brackets until \\b (remove if undesired) |\\[([^]]*)]
or Group 2: What's inside square brackets |<([^>]*)>
or Group 3: What's inside angle brackets See demo at regex101 (use "code generator" if needed)
<(.+?)>(?![^\[]*\])|\[(.+?)\]|((?!\s+)[^\[\]<>]+)
You can simply use this re.findall
.See demo.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.