I want to split a string into words [a-zA-Z]
and any special character that it may contain except @
and #
symbols
message = "I am to be @split, into #words, And any other thing that is not word, mostly special character(.,>)"
Expected Result:
['I', 'am', 'to', 'be', '@split', ',', 'into', '#words', ',', 'And', 'any', 'other', 'thing', 'that', 'is', 'not', 'word', ',', 'mostly', 'special', 'character', '(', '.', ',', '>', ')']
How can I achieve this in Python?
How about:
re.findall(r"[A-Za-z@#]+|\S", message)
The pattern matches any sequence of word characters (here, defined as letters plus @
and #
), or any single non-whitespace character.
You can use a character class to specify all of the characters you don't want for the split. [^\\w@#]
-- this means every character except letters/numbers/underscore/@/#
Then you can capture the special characters as well using capturing parentheses in re.split
.
filter(None, re.split(r'\s|([^\w@#])', message))
The filter
is done to remove empty strings from splitting between special characters. The \\s|
part is so that spaces are not captured.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.