[英]How do You Split String into Words and Special Characters in Python?
I want to split a string into words [a-zA-Z]
and any special character that it may contain except @
and #
symbols 我想将字符串拆分为单词
[a-zA-Z]
和除@
和#
符号外可能包含的任何特殊字符
message = "I am to be @split, into #words, And any other thing that is not word, mostly special character(.,>)"
Expected Result: 预期结果:
['I', 'am', 'to', 'be', '@split', ',', 'into', '#words', ',', 'And', 'any', 'other', 'thing', 'that', 'is', 'not', 'word', ',', 'mostly', 'special', 'character', '(', '.', ',', '>', ')']
How can I achieve this in Python? 如何在Python中实现?
How about: 怎么样:
re.findall(r"[A-Za-z@#]+|\S", message)
The pattern matches any sequence of word characters (here, defined as letters plus @
and #
), or any single non-whitespace character. 该模式匹配单词字符的任何序列(此处定义为字母加
@
和#
),或任何单个非空白字符。
You can use a character class to specify all of the characters you don't want for the split. 您可以使用字符类来指定不需要分割的所有字符。
[^\\w@#]
-- this means every character except letters/numbers/underscore/@/# [^\\w@#]
-表示除字母/数字/下划线/ @ /#之外的所有字符
Then you can capture the special characters as well using capturing parentheses in re.split
. 然后,您也可以使用
re.split
括号捕获特殊字符。
filter(None, re.split(r'\s|([^\w@#])', message))
The filter
is done to remove empty strings from splitting between special characters. 进行
filter
是为了除去空字符串,避免在特殊字符之间进行拆分。 The \\s|
\\s|
part is so that spaces are not captured. 部分是为了不捕获空间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.