I want to split a string based on multiple delimiters:
,
.
/
\
|
+
&
;
AND
(case insensitive) However, I also want to extract text within brackets of different types, ()
, {}
, []
This is an example string that I want to convert:
"Hello (Bob), Tree+Leaf. {text} AND Bye"
And I would want it to be split into an array like such:
["Hello", "Bob", "Tree", "Leaf", "text", "Bye"]
I understand how I can split the substrings based on commas, spaces, by using re.split(',|.|/|\\|\||\+|\&|;|AND', input_string)
, but I am not sure how you can also extract the text out of the parantheses in the same iteration as doing the other delimiter splits.
Also I would like it so that all the substrings are trimmed, for example if I were to split on this string "Hello, World"
I would want the output to be ["Hello", "World"]
and not ["Hello", " World"]
.
Use
[t for t in re.split(r'\s*(?:\bAND\b|[,./\\|+&;]|\(([^()]*)\)|\[([^][]*)]|{([^{}]*)})\s*', input_string) if t]
EXPLANATION
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
AND 'AND'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[,./\\|+&;] any character of: ',', '.', '/', '\\',
'|', '+', '&', ';'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^()]* any character except: '(', ')' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^][]* any character except: ']', '[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
] ']'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
} '}'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
See Python proof :
import re
input_string = "Hello (Bob), Tree+Leaf. {text} AND Bye"
print( [t for t in re.split(r'\s*(?:\bAND\b|[,./\\|+&;]|\(([^()]*)\)|\[([^][]*)]|{([^{}]*)})\s*', input_string) if t] )
Results : ['Hello', 'Bob', 'Tree', 'Leaf', 'text', 'Bye']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.