简体   繁体   中英

How do You Split String into Words and Special Characters in Python?

I want to split a string into words [a-zA-Z] and any special character that it may contain except @ and # symbols

message = "I am to be @split, into #words, And any other thing that is not word, mostly special character(.,>)"

Expected Result:

['I', 'am', 'to', 'be', '@split', ',', 'into', '#words', ',', 'And', 'any', 'other', 'thing', 'that', 'is', 'not', 'word', ',', 'mostly', 'special', 'character', '(', '.', ',', '>', ')']

How can I achieve this in Python?

How about:

re.findall(r"[A-Za-z@#]+|\S", message)

The pattern matches any sequence of word characters (here, defined as letters plus @ and # ), or any single non-whitespace character.

You can use a character class to specify all of the characters you don't want for the split. [^\\w@#] -- this means every character except letters/numbers/underscore/@/#

Then you can capture the special characters as well using capturing parentheses in re.split .

filter(None, re.split(r'\s|([^\w@#])', message))

The filter is done to remove empty strings from splitting between special characters. The \\s| part is so that spaces are not captured.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM